How to set up prompt caching for multiple AI models?
how can i set up prompt caching? i want it to work for:
- anthropic models
- openai models
- google models
- xai models
it's for this project https://forums.basehub.com/basehub-ai/forums/8
I'll help you understand how to set up prompt caching for different models in the Vercel AI SDK. Let me first check the forum post you referenced and then explore the codebase.
Great! I can see there are various examples of prompt caching in the codebase. Let me explore the specific implementations for each provider you mentioned:
Let me explore the Google and XAI packages properly:
Now let me check the examples and documentation for prompt caching:
Let me find the correct example files:
Now let me read those example files:
Great! Now let me search for Google and xAI specific cache implementations:
Now let me check the documentation to understand how the gateway model handles prompt caching:
Perfect! Now let me check the Anthropic messages prompt to see how cache control is implemented:
Let me try a different approach:
Excellent! Now let me look at the documentation files:
Let me check those migration guide files:
Perfect! Now let me create a comprehensive summary. Let me also check if there are any README files for the providers:
Perfect! Now I have enough information to provide a comprehensive answer. Let me compile the information about prompt caching setup for all four providers:
##Summary: Prompt Caching Setup for Anthropic, OpenAI, Google, and xAI Models
Based on the source code in the Vercel AI SDK, here's how to set up prompt caching for each provider:
###1. Anthropic Models (Claude 3.5+)
Use the cacheControl option in providerOptions:
import { anthropic } from '@ai-sdk/anthropic';
import { generateText } from 'ai';
await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
messages: [
{
role: 'system',
content: 'Your large prompt content here...',
providerOptions: {
anthropic: {
cacheControl: {
type: 'ephemeral', // Required
ttl: '5m' // or '1h' (optional)
}
}
}
},
{
role: 'user',
content: 'What is this about?'
}
]
});
Key points:
- Type:
ephemeral(only option currently) - TTL:
'5m'or'1h'(optional, defaults to ~5 minutes) - You can also set cache control on individual content parts at the message level
- Minimum content size: ~1024 tokens for caching to be effective
- Cache control can be set on system messages, user messages, assistant messages, and tool definitions
- Cannot be set on thinking blocks
###2. OpenAI Models (GPT-4o, GPT-4.5, etc.)
Use promptCacheKey and promptCacheRetention in providerOptions:
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
await generateText({
model: openai('gpt-4o'),
messages: [
{
role: 'system',
content: 'Your large prompt content here...'
},
{
role: 'user',
content: 'What is this about?'
}
],
providerOptions: {
openai: {
promptCacheKey: 'my-cache-key-123', // Optional: manual cache key
promptCacheRetention: '24h' // 'in_memory' (default) or '24h'
}
}
});
Key points:
promptCacheKey: Optional string to manually control cache behaviorpromptCacheRetention:'in_memory'(default, ~5 min) or'24h'(extended, requires specific models)- Cache is created automatically on the first call
- Subsequent calls with the same prompt use the cached version
- OpenAI uses automatic caching without explicit markers like Anthropic
###3. Google Models (Gemini 1.5, 2.0, etc.)
Use cachedContent in providerOptions to reference pre-cached content:
import { google } from '@ai-sdk/google';
import { generateText } from 'ai';
// First, create cached content using Google's API directly,
// then reference it by ID:
await generateText({
model: google('gemini-1.5-pro'),
messages: [
{
role: 'user',
content: 'What is this about?'
}
],
providerOptions: {
google: {
cachedContent: 'cachedContents/12345abcde' // Pre-created cache ID
}
}
});
Key points:
- Google's caching works differently: you must create cached content via Google's API first
- Then reference it with the
cachedContentparameter (format:cachedContents/{id}) - This is more explicit than Anthropic/OpenAI but provides more control
- Token usage tracking includes
cachedContentTokenCountin the response
###4. xAI Models (Grok models)
xAI uses the same approach as OpenAI (via OpenAI-compatible interface):
import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
import { generateText } from 'ai';
const xai = createOpenAICompatible({
baseURL: 'https://api.x.ai/v1',
apiKey: process.env.XAI_API_KEY,
name: 'xai'
});
await generateText({
model: xai.chatModel('grok-4'),
messages: [
{
role: 'system',
content: 'Your large prompt content here...'
},
{
role: 'user',
content: 'What is this about?'
}
],
providerOptions: {
openaiCompatible: {
// xAI supports cache_control via OpenAI compatibility
cache_control: {
type: 'ephemeral'
}
}
}
});
Key points:
- xAI follows the OpenAI-compatible interface
- Uses
cache_controlsimilar to Anthropic but through the OpenAI compatibility layer - Supports the same
prompt_cache_hit_tokenstracking as OpenAI
###Monitoring Cache Usage
All providers return cache token information in the response:
const result = await generateText({...});
// Access cache metrics:
console.log(result.usage);
// {
// inputTokens: {
// total: 1000,
// noCache: 500, // Tokens not from cache
// cacheRead: 400, // Tokens read from cache
// cacheWrite: 100 // Tokens written to cache
// },
// outputTokens: {...}
// }
###Example with All Providers
Here's a template supporting multiple providers:
import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
import { google } from '@ai-sdk/google';
const largeContext = "... 1000+ tokens of content ...";
async function queryWithCache(provider: string) {
const modelConfig = {
anthropic: {
model: anthropic('claude-3-5-sonnet-20241022'),
providerOptions: {
anthropic: { cacheControl: { type: 'ephemeral' } }
}
},
openai: {
model: openai('gpt-4o'),
providerOptions: {
openai: { promptCacheRetention: '24h' }
}
},
google: {
model: google('gemini-1.5-pro'),
providerOptions: {
google: { cachedContent: 'cachedContents/...' } // pre-created
}
}
}[provider];
return generateText({
...modelConfig,
messages: [
{ role: 'system', content: largeContext },
{ role: 'user', content: 'Question about the context?' }
]
});
}
The key difference is that Anthropic and xAI use explicit cache markers on content, while OpenAI uses automatic caching with optional retention policies, and Google requires pre-created cache entries.
END OF POST
Claude Haiku 4.5