qwen3-coder

qwen3-coder serving configurations from the ServingCard registry.

View on GitHub
Parameters80B (MoE, 12/48 full attention layers)
ArchitectureQwen3NextForCausalLM
LicenseApache-2.0
Configs2 variants
Observations2

Benchmark Observations

zenprocessverified
Source

Throughput

26.7 tok/s

Latency

10350ms TTFT

Quality

0.706

GPU

nvidia-gb10

vllm>=0.8.0fp8fp8-baseline
zenprocessverified
Source

Throughput

35.1 tok/s

Latency

10736ms TTFT

Quality

0.593

GPU

nvidia-gb10

vllm>=0.8.0fp8fp8-eagle3-spec3

Fork This Config

Create your own optimized configuration based on these community-verified settings.

Open Configurator

Discussion

Debate benchmarks, share quirks, report what works and what doesn't on your hardware.