How does prompt caching work across models?

HEAD
b9aad20fix(app): open project search (#11783)
This post might have stale content, as HEAD is 407 commits ahead.

how does prompt caching work? how do they handle all models?

Avatar of anthropic/claude-opus-4.5
Claude Opus 4.5
commented

Now I have a comprehensive understanding. Let me summarize how prompt caching works in OpenCode:


##How Prompt Caching Works in OpenCode

OpenCode implements prompt caching in a provider-agnostic way while respecting each provider's specific caching mechanisms. Here's the breakdown:

###1. Two Types of Caching Mechanisms

####A. Session-Based Cache Keys (OpenAI/Responses API Style)

For providers that support session-based caching (like OpenAI), OpenCode sends a promptCacheKey parameter:

From packages/opencode/src/provider/transform.ts:

if (input.model.providerID === "openai" || input.providerOptions?.setCacheKey) {
  result["promptCacheKey"] = input.sessionID
}

This is used for:

  • OpenAI (automatically enabled)
  • Venice (automatically enabled)
  • Any provider where you set setCacheKey: true in config

The cache key is set to the session ID, so all messages within the same session can benefit from prompt caching.

####B. Anthropic-Style Cache Control (Message-Level)

For Anthropic, Bedrock, OpenRouter, and similar providers, OpenCode uses message-level cache control markers:

From packages/opencode/src/provider/transform.ts:

function applyCaching(msgs: ModelMessage[], providerID: string): ModelMessage[] {
  const system = msgs.filter((msg) => msg.role === "system").slice(0, 2)
  const final = msgs.filter((msg) => msg.role !== "system").slice(-2)

  const providerOptions = {
    anthropic: { cacheControl: { type: "ephemeral" } },
    openrouter: { cacheControl: { type: "ephemeral" } },
    bedrock: { cachePoint: { type: "default" } },
    openaiCompatible: { cache_control: { type: "ephemeral" } },
    copilot: { copilot_cache_control: { type: "ephemeral" } },
  }

  for (const msg of unique([...system, ...final])) {
    // Apply cache control markers
    msg.providerOptions = mergeDeep(msg.providerOptions ?? {}, providerOptions)
  }
  return msgs
}

Cache markers are applied to:

  • First 2 system messages (usually the main system prompt)
  • Last 2 non-system messages (the most recent context)

This is triggered for any model that includes "anthropic" or "claude" in its ID/npm package.

###2. How Models Handle Cache Information

When the LLM responds, cache usage is returned in the provider metadata and tracked:

From packages/opencode/src/session/index.ts:

const cacheReadInputTokens = input.usage.cachedInputTokens ?? 0
const cacheWriteInputTokens = (
  input.metadata?.["anthropic"]?.["cacheCreationInputTokens"] ??
  input.metadata?.["bedrock"]?.["usage"]?.["cacheWriteInputTokens"] ??
  input.metadata?.["venice"]?.["usage"]?.["cacheCreationInputTokens"] ??
  0
) as number

// Cost calculation includes cache pricing
const tokens = {
  input: safe(adjustedInputTokens),
  output: safe(input.usage.outputTokens ?? 0),
  reasoning: safe(input.usage?.reasoningTokens ?? 0),
  cache: {
    write: safe(cacheWriteInputTokens),
    read: safe(cacheReadInputTokens),
  },
}

###3. Provider-Specific Handling

ProviderCache MechanismHow It Works
OpenAIpromptCacheKeySession ID sent automatically
AnthropiccacheControl: ephemeralApplied to system + last 2 messages
BedrockcachePoint: defaultApplied to system + last 2 messages
OpenRoutercacheControl: ephemeralApplied to system + last 2 messages
VenicepromptCacheKeySession ID sent automatically
Any providersetCacheKey: true optionOpt-in via config

###4. Configuration

You can enable prompt caching for providers that support it but don't have it auto-enabled:

From packages/web/src/content/docs/config.mdx:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "anthropic": {
      "options": {
        "setCacheKey": true
      }
    }
  }
}

###5. Cost Tracking

OpenCode tracks cache read and cache write costs separately from regular input tokens. The models.dev API provides per-model pricing:

From packages/opencode/src/provider/models.ts:

cost: z.object({
  input: z.number(),
  output: z.number(),
  cache_read: z.number().optional(),
  cache_write: z.number().optional(),
  context_over_200k: z.object({...}).optional(),
})

###Summary

OpenCode uses a unified approach that:

  1. Auto-detects the provider type and applies the appropriate caching strategy
  2. Marks specific messages for caching (system prompts + recent context)
  3. Uses session IDs as cache keys for providers that support key-based caching
  4. Tracks cache usage in token counts and cost calculations
  5. Allows opt-in via config for providers where it's not auto-enabled

END OF POST

How does prompt caching work across models? — anomalyco/opencode