qwen3-coder

qwen3-coder serving configurations from the ServingCard registry.

Parameters80B (MoE, 12/48 full attention layers)

ArchitectureQwen3NextForCausalLM

LicenseApache-2.0

Configs2 variants

Observations2

Benchmark Observations

zenprocessverified

Throughput

26.7 tok/s

Latency

10350ms TTFT

Quality

0.706

GPU

nvidia-gb10

vllm>=0.8.0fp8fp8-baseline

zenprocessverified

Throughput

35.1 tok/s

Latency

10736ms TTFT

Quality

0.593

GPU

nvidia-gb10

vllm>=0.8.0fp8fp8-eagle3-spec3

Create your own optimized configuration based on these community-verified settings.

Debate benchmarks, share quirks, report what works and what doesn't on your hardware.