Registry
Every config is benchmarked on real hardware with PawBench. The story column tells you why each config exists and when to use it.
| Model | Config | Story | Throughput | Latency | |
|---|---|---|---|---|---|
qwen3-coder Best | fp8-baseline Verified nvidia-gb10 · fp8 | 26.7 tok/s | 10350ms TTFT | GitHub | |
qwen3.5-27b-awq | turboquant-3.5-triton-native Verified nvidia-gb10 · Unknown | 27.9 tok/s | 10267ms TTFT | GitHub | |
qwen3-coder | fp8-eagle3-spec3 Verified nvidia-gb10 · fp8 | 35.1 tok/s | 10736ms TTFT | GitHub | |
devstral-small-24b | baseline Verified nvidia-gb10 · Unknown | 53.6 tok/s | 74ms TTFT | GitHub | |
deepseek-coder-v2-lite | fp8-baseline Verified nvidia-gb10 · Unknown | 58.1 tok/s | 3850ms TTFT | GitHub |
qwen3-coder
Best Verifiednvidia-gb10 · vllm>=0.8.0 · fp8 · fp8-baseline
Throughput
26.7 tok/s
Latency
10350ms TTFT
qwen3.5-27b-awq
Verifiednvidia-gb10 · vllm>=0.18.0rc1 · Unknown · turboquant-3.5-triton-native
Throughput
27.9 tok/s
Latency
10267ms TTFT
qwen3-coder
Verifiednvidia-gb10 · vllm>=0.8.0 · fp8 · fp8-eagle3-spec3
Throughput
35.1 tok/s
Latency
10736ms TTFT
devstral-small-24b
Verifiednvidia-gb10 · vllm>=0.8.0 · Unknown · baseline
Throughput
53.6 tok/s
Latency
74ms TTFT
deepseek-coder-v2-lite
Verifiednvidia-gb10 · vllm>=0.8.0 · Unknown · fp8-baseline
Throughput
58.1 tok/s
Latency
3850ms TTFT