← Back to Learn
#optimizations

CACP — 10x Token Savings for Agent Communication

5 min read · March 2026

When an LLM coding agent finishes a task, it typically responds with 2,000 tokens of prose: "I've completed the implementation. I created a new file at src/middleware/jwt.go which implements the JWT authentication middleware. The middleware validates Bearer tokens from the Authorization header..."

The orchestrator doesn't need prose. It needs: did it work, what files changed, did tests pass. That's ~200 tokens, not 2,000.

The CACP Response Format

STATUS:ok
FILES_CREATED:src/middleware/jwt.go,src/middleware/jwt_test.go
FILES_MODIFIED:go.mod
TESTS:pass:12
BUILD:pass
LEARNED:JWT tokens need 24h expiry for mobile clients

Why It Matters for Serving

Token cost grows O(n²) with conversation turns. Every response the agent gives gets fed back as context for the next turn. At 2,000 tokens per response over 30 turns, that's 60,000 tokens of accumulated prose. With CACP at 200 tokens per response, it's 6,000 — a 10x reduction in context accumulation.

On Hermes/Qwen3-coder with 65K context: Without CACP, you exhaust the context window in ~15 tool calls. With CACP, you get ~60 tool calls before context pressure. That's the difference between a working agent and one that forgets what it's doing.

How We Enforce It

Three layers, each model-appropriate:

Claude

--json-schema CACPResponse — Pydantic model enforced by the CLI

Hermes/Qwen

Two-phase closure: final turn with tools=None → 100% CACP compliance

Fallback

Regex parser extracts STATUS/FILES/TESTS from free-form text when schema fails

The System Prompt That Won

We ran 378 autoresearch iterations to find the optimal system prompt for CACP compliance on Qwen3-coder. The winner is 6 lines:

You are a coding agent. Use tools to build what is asked.

MANDATORY OUTPUT FORMAT — violation causes automatic rejection:
When you are done using tools, your response MUST be:
STATUS:ok
FILES_CREATED:file1,file2
FILES_MODIFIED:
TESTS:pass:0
BUILD:pass
LEARNED:one sentence

Rules:
- First line MUST start with STATUS:
- Do NOT write any text before STATUS:

The key insight: "First line MUST start with STATUS:" — everything else is noise the model ignores. 100% compliance, 3/3 runs.

Benchmark Results

CACP compliance varies dramatically across models. On our registry, you can see compliance rates for each model/hardware combo.

ModelCACP ComplianceTool Accuracy
Qwen3-coder (Eagle3)30%70%
Qwen3-coder (baseline)10%89%
DeepSeek-Coder-V2-Lite100%0%
Devstral-Small-24B100%0%

Interesting: smaller models achieve 100% CACP compliance but 0% tool accuracy. They follow the format perfectly but can't actually use the tools correctly. CACP compliance alone doesn't mean the agent works.

CACP is open source. View the spec on GitHub →