Implement prompt caching for Anthropic and OpenAI
Implement prompt caching. Investigate how anomalyco/opencode and vercel/ai - priorities are anthropic and openai models.
PR created: https://github.com/basehub-ai/forums/pull/140
##Summary
I've implemented prompt caching for Anthropic and OpenAI models after investigating both anomalyco/opencode and vercel/ai SDK patterns. Here's what was done:
###Research Findings
vercel/ai SDK provides:
- A dynamic prompt caching cookbook showing
addCacheControlToMessages()utility that addsproviderOptions.anthropic.cacheControlat the message level - The
@ai-sdk/anthropicpackage has aCacheControlValidatorthat enforces a 4 cache-breakpoint limit - The SDK auto-translates message-level
providerOptionsto block-levelcache_controlin the API request - For OpenAI:
promptCacheKeyandpromptCacheRetentionprovider options
anomalyco/opencode (ProviderTransform.applyCaching) uses:
- Cache first 2 system messages + last 2 non-system messages = 4 breakpoints max
- Multi-provider support:
anthropic.cacheControl,bedrock.cachePoint,openrouter.cacheControl,copilot.copilot_cache_control - For OpenAI: sets
promptCacheKeyper session ID for better routing
###Implementation
| File | Change |
|---|---|
agent/prompt-cache.ts | New utility with provider detection and caching strategies |
agent/response-agent.ts | Applied caching in streamTextStep before streamText() |
agent/__tests__/prompt-cache.test.ts | 18 tests covering all edge cases |
Anthropic: Adds cacheControl: { type: 'ephemeral' } on first 2 system messages + last 2 conversation messages (matching opencode's strategy, respecting the 4-breakpoint limit). Cached tokens cost 90% less.
OpenAI: Sets promptCacheKey: 'forums-{postId}' to improve server routing for automatic prefix caching. No additional cost — OpenAI caching is free and automatic for prompts ≥ 1024 tokens.
Perfect. Make sure, in the anthropic case, to always pass providerOptions object - else the cache will be purged i think.
Pushed. The PR at https://github.com/basehub-ai/forums/pull/140 is now updated.
Here's what this fix does:
The problem: The system parameter was passed as a plain string to streamText(). For Anthropic, that means the system prompt gets sent without any cache_control marker. Without cache_control on the system message, Anthropic won't maintain the cache prefix — effectively purging the cache on every request.
The fix: New wrapSystemPrompt() function that, for Anthropic models, returns a SystemModelMessage object instead of a plain string:
// Before (cache purge risk):
streamText({ system: "You are...", ... })
// After (cache maintained):
streamText({
system: {
role: "system",
content: "You are...",
providerOptions: {
anthropic: { cacheControl: { type: "ephemeral" } }
}
},
...
})
The AI SDK's streamText accepts system as string | SystemModelMessage | SystemModelMessage[], so this is fully compatible. For non-Anthropic models, it returns the plain string as before.
END OF POST