Compare Models

Select 2-4 models to compare their serving benchmarks side-by-side.

qwen3-coder
deepseek-coder-v2-lite
devstral-small-24b
qwen3-coder
fp8-baseline Verified
tok/s (single)26.7
tok/s (parallel)186.4
TTFT10,350ms
Quality0.706%
CACP compliance0.1%
Tool accuracy0.889%
Useful ratio0.998%
Context80B (MoE, 12/48 full attention layers)
Hardwarenvidia-gb10
Statusverified
deepseek-coder-v2-lite
fp8-baseline Verified
tok/s (single)58.1
tok/s (parallel)110.9
TTFT3,850ms
Quality0.286%
CACP compliance1%
Tool accuracy0%
Useful ratio0.977%
Context16B (MoE)
Hardwarenvidia-gb10
Statusverified
devstral-small-24b
baseline Verified
tok/s (single)53.6
tok/s (parallel)110.2
TTFT74ms
Quality0.293%
CACP compliance1%
Tool accuracy0%
Useful ratio0.977%
Context24B
Hardwarenvidia-gb10
Statusverified