Compare Models
Select 2-4 models to compare their serving benchmarks side-by-side.
qwen3-coder
deepseek-coder-v2-lite
devstral-small-24b
| Metric | qwen3-coder fp8-baseline Verified | deepseek-coder-v2-lite fp8-baseline Verified | devstral-small-24b baseline Verified |
|---|---|---|---|
| tok/s (single) | 26.7 | 58.1 | 53.6 |
| tok/s (parallel) | 186.4 | 110.9 | 110.2 |
| TTFT | 10,350ms | 3,850ms | 74ms |
| Quality | 0.706% | 0.286% | 0.293% |
| CACP compliance | 0.1% | 1% | 1% |
| Tool accuracy | 0.889% | 0% | 0% |
| Useful ratio | 0.998% | 0.977% | 0.977% |
| Context | 80B (MoE, 12/48 full attention layers) | 16B (MoE) | 24B |
| Hardware | nvidia-gb10 | nvidia-gb10 | nvidia-gb10 |
| Status | verified | verified | verified |
qwen3-coder
fp8-baseline Verified
tok/s (single)26.7
tok/s (parallel)186.4
TTFT10,350ms
Quality0.706%
CACP compliance0.1%
Tool accuracy0.889%
Useful ratio0.998%
Context80B (MoE, 12/48 full attention layers)
Hardwarenvidia-gb10
Statusverified
deepseek-coder-v2-lite
fp8-baseline Verified
tok/s (single)58.1
tok/s (parallel)110.9
TTFT3,850ms
Quality0.286%
CACP compliance1%
Tool accuracy0%
Useful ratio0.977%
Context16B (MoE)
Hardwarenvidia-gb10
Statusverified
devstral-small-24b
baseline Verified
tok/s (single)53.6
tok/s (parallel)110.2
TTFT74ms
Quality0.293%
CACP compliance1%
Tool accuracy0%
Useful ratio0.977%
Context24B
Hardwarenvidia-gb10
Statusverified