How does prompt caching work across models?
how does prompt caching work? how do they handle all models?
Now I have a comprehensive understanding. Let me summarize how prompt caching works in OpenCode:
##How Prompt Caching Works in OpenCode
OpenCode implements prompt caching in a provider-agnostic way while respecting each provider's specific caching mechanisms. Here's the breakdown:
###1. Two Types of Caching Mechanisms
####A. Session-Based Cache Keys (OpenAI/Responses API Style)
For providers that support session-based caching (like OpenAI), OpenCode sends a promptCacheKey parameter:
From packages/opencode/src/provider/transform.ts:
if (input.model.providerID === "openai" || input.providerOptions?.setCacheKey) {
result["promptCacheKey"] = input.sessionID
}
This is used for:
- OpenAI (automatically enabled)
- Venice (automatically enabled)
- Any provider where you set
setCacheKey: truein config
The cache key is set to the session ID, so all messages within the same session can benefit from prompt caching.
####B. Anthropic-Style Cache Control (Message-Level)
For Anthropic, Bedrock, OpenRouter, and similar providers, OpenCode uses message-level cache control markers:
From packages/opencode/src/provider/transform.ts:
function applyCaching(msgs: ModelMessage[], providerID: string): ModelMessage[] {
const system = msgs.filter((msg) => msg.role === "system").slice(0, 2)
const final = msgs.filter((msg) => msg.role !== "system").slice(-2)
const providerOptions = {
anthropic: { cacheControl: { type: "ephemeral" } },
openrouter: { cacheControl: { type: "ephemeral" } },
bedrock: { cachePoint: { type: "default" } },
openaiCompatible: { cache_control: { type: "ephemeral" } },
copilot: { copilot_cache_control: { type: "ephemeral" } },
}
for (const msg of unique([...system, ...final])) {
// Apply cache control markers
msg.providerOptions = mergeDeep(msg.providerOptions ?? {}, providerOptions)
}
return msgs
}
Cache markers are applied to:
- First 2 system messages (usually the main system prompt)
- Last 2 non-system messages (the most recent context)
This is triggered for any model that includes "anthropic" or "claude" in its ID/npm package.
###2. How Models Handle Cache Information
When the LLM responds, cache usage is returned in the provider metadata and tracked:
From packages/opencode/src/session/index.ts:
const cacheReadInputTokens = input.usage.cachedInputTokens ?? 0
const cacheWriteInputTokens = (
input.metadata?.["anthropic"]?.["cacheCreationInputTokens"] ??
input.metadata?.["bedrock"]?.["usage"]?.["cacheWriteInputTokens"] ??
input.metadata?.["venice"]?.["usage"]?.["cacheCreationInputTokens"] ??
0
) as number
// Cost calculation includes cache pricing
const tokens = {
input: safe(adjustedInputTokens),
output: safe(input.usage.outputTokens ?? 0),
reasoning: safe(input.usage?.reasoningTokens ?? 0),
cache: {
write: safe(cacheWriteInputTokens),
read: safe(cacheReadInputTokens),
},
}
###3. Provider-Specific Handling
| Provider | Cache Mechanism | How It Works |
|---|---|---|
| OpenAI | promptCacheKey | Session ID sent automatically |
| Anthropic | cacheControl: ephemeral | Applied to system + last 2 messages |
| Bedrock | cachePoint: default | Applied to system + last 2 messages |
| OpenRouter | cacheControl: ephemeral | Applied to system + last 2 messages |
| Venice | promptCacheKey | Session ID sent automatically |
| Any provider | setCacheKey: true option | Opt-in via config |
###4. Configuration
You can enable prompt caching for providers that support it but don't have it auto-enabled:
From packages/web/src/content/docs/config.mdx:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"anthropic": {
"options": {
"setCacheKey": true
}
}
}
}
###5. Cost Tracking
OpenCode tracks cache read and cache write costs separately from regular input tokens. The models.dev API provides per-model pricing:
From packages/opencode/src/provider/models.ts:
cost: z.object({
input: z.number(),
output: z.number(),
cache_read: z.number().optional(),
cache_write: z.number().optional(),
context_over_200k: z.object({...}).optional(),
})
###Summary
OpenCode uses a unified approach that:
- Auto-detects the provider type and applies the appropriate caching strategy
- Marks specific messages for caching (system prompts + recent context)
- Uses session IDs as cache keys for providers that support key-based caching
- Tracks cache usage in token counts and cost calculations
- Allows opt-in via config for providers where it's not auto-enabled
END OF POST