How does prompt caching work across models?

HEAD

b9aad20fix(app): open project search (#11783)

This post might have stale content, as HEAD is 3025 commits ahead.

Julian Benegas posted

how does prompt caching work? how do they handle all models?

Claude Opus 4.5 commented

Now I have a comprehensive understanding. Let me summarize how prompt caching works in OpenCode:

##How Prompt Caching Works in OpenCode

OpenCode implements prompt caching in a provider-agnostic way while respecting each provider's specific caching mechanisms. Here's the breakdown:

###1. Two Types of Caching Mechanisms

####A. Session-Based Cache Keys (OpenAI/Responses API Style)

For providers that support session-based caching (like OpenAI), OpenCode sends a promptCacheKey parameter:

From packages/opencode/src/provider/transform.ts:

if (input.model.providerID === "openai" || input.providerOptions?.setCacheKey) {
  result["promptCacheKey"] = input.sessionID
}

This is used for:

OpenAI (automatically enabled)
Venice (automatically enabled)
Any provider where you set setCacheKey: true in config

The cache key is set to the session ID, so all messages within the same session can benefit from prompt caching.

####B. Anthropic-Style Cache Control (Message-Level)

For Anthropic, Bedrock, OpenRouter, and similar providers, OpenCode uses message-level cache control markers:

From packages/opencode/src/provider/transform.ts:

function applyCaching(msgs: ModelMessage[], providerID: string): ModelMessage[] {
  const system = msgs.filter((msg) => msg.role === "system").slice(0, 2)
  const final = msgs.filter((msg) => msg.role !== "system").slice(-2)

  const providerOptions = {
    anthropic: { cacheControl: { type: "ephemeral" } },
    openrouter: { cacheControl: { type: "ephemeral" } },
    bedrock: { cachePoint: { type: "default" } },
    openaiCompatible: { cache_control: { type: "ephemeral" } },
    copilot: { copilot_cache_control: { type: "ephemeral" } },
  }

  for (const msg of unique([...system, ...final])) {
    // Apply cache control markers
    msg.providerOptions = mergeDeep(msg.providerOptions ?? {}, providerOptions)
  }
  return msgs
}

Cache markers are applied to:

First 2 system messages (usually the main system prompt)
Last 2 non-system messages (the most recent context)

This is triggered for any model that includes "anthropic" or "claude" in its ID/npm package.

###2. How Models Handle Cache Information

When the LLM responds, cache usage is returned in the provider metadata and tracked:

From packages/opencode/src/session/index.ts:

const cacheReadInputTokens = input.usage.cachedInputTokens ?? 0
const cacheWriteInputTokens = (
  input.metadata?.["anthropic"]?.["cacheCreationInputTokens"] ??
  input.metadata?.["bedrock"]?.["usage"]?.["cacheWriteInputTokens"] ??
  input.metadata?.["venice"]?.["usage"]?.["cacheCreationInputTokens"] ??
  0
) as number

// Cost calculation includes cache pricing
const tokens = {
  input: safe(adjustedInputTokens),
  output: safe(input.usage.outputTokens ?? 0),
  reasoning: safe(input.usage?.reasoningTokens ?? 0),
  cache: {
    write: safe(cacheWriteInputTokens),
    read: safe(cacheReadInputTokens),
  },
}

###3. Provider-Specific Handling

Provider	Cache Mechanism	How It Works
OpenAI	`promptCacheKey`	Session ID sent automatically
Anthropic	`cacheControl: ephemeral`	Applied to system + last 2 messages
Bedrock	`cachePoint: default`	Applied to system + last 2 messages
OpenRouter	`cacheControl: ephemeral`	Applied to system + last 2 messages
Venice	`promptCacheKey`	Session ID sent automatically
Any provider	`setCacheKey: true` option	Opt-in via config

###4. Configuration

You can enable prompt caching for providers that support it but don't have it auto-enabled:

From packages/web/src/content/docs/config.mdx:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "anthropic": {
      "options": {
        "setCacheKey": true
      }
    }
  }
}

###5. Cost Tracking

OpenCode tracks cache read and cache write costs separately from regular input tokens. The models.dev API provides per-model pricing:

From packages/opencode/src/provider/models.ts:

cost: z.object({
  input: z.number(),
  output: z.number(),
  cache_read: z.number().optional(),
  cache_write: z.number().optional(),
  context_over_200k: z.object({...}).optional(),
})

###Summary

OpenCode uses a unified approach that:

Auto-detects the provider type and applies the appropriate caching strategy
Marks specific messages for caching (system prompts + recent context)
Uses session IDs as cache keys for providers that support key-based caching
Tracks cache usage in token counts and cost calculations
Allows opt-in via config for providers where it's not auto-enabled

END OF POST