How to set up prompt caching for multiple AI models?
how can i set up prompt caching? i want it to work for:
- anthropic models
- openai models
- google models
- xai models
it's for this project https://forums.basehub.com/basehub-ai/forums/8
Perfect! Now I have enough information to provide a comprehensive answer. Let me compile the information about prompt caching setup for all four providers:
##Summary: Prompt Caching Setup for Anthropic, OpenAI, Google, and xAI Models
Based on the source code in the Vercel AI SDK, here's how to set up prompt caching for each provider:
###1. Anthropic Models (Claude 3.5+)
Use the cacheControl option in providerOptions:
import { anthropic } from '@ai-sdk/anthropic';
import { generateText } from 'ai';
await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
messages: [
{
role: 'system',
content: 'Your large prompt content here...',
providerOptions: {
anthropic: {
cacheControl: {
type: 'ephemeral', // Required
ttl: '5m' // or '1h' (optional)
}
}
}
},
{
role: 'user',
content: 'What is this about?'
}
]
});
Key points:
- Type:
ephemeral(only option currently) - TTL:
'5m'or'1h'(optional, defaults to ~5 minutes) - You can also set cache control on individual content parts at the message level
- Minimum content size: ~1024 tokens for caching to be effective
- Cache control can be set on system messages, user messages, assistant messages, and tool definitions
- Cannot be set on thinking blocks
###2. OpenAI Models (GPT-4o, GPT-4.5, etc.)
Use promptCacheKey and promptCacheRetention in providerOptions:
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
await generateText({
model: openai('gpt-4o'),
messages: [
{
role: 'system',
content: 'Your large prompt content here...'
},
{
role: 'user',
content: 'What is this about?'
}
],
providerOptions: {
openai: {
promptCacheKey: 'my-cache-key-123', // Optional: manual cache key
promptCacheRetention: '24h' // 'in_memory' (default) or '24h'
}
}
});
Key points:
promptCacheKey: Optional string to manually control cache behaviorpromptCacheRetention:'in_memory'(default, ~5 min) or'24h'(extended, requires specific models)- Cache is created automatically on the first call
- Subsequent calls with the same prompt use the cached version
- OpenAI uses automatic caching without explicit markers like Anthropic
###3. Google Models (Gemini 1.5, 2.0, etc.)
Use cachedContent in providerOptions to reference pre-cached content:
import { google } from '@ai-sdk/google';
import { generateText } from 'ai';
// First, create cached content using Google's API directly,
// then reference it by ID:
await generateText({
model: google('gemini-1.5-pro'),
messages: [
{
role: 'user',
content: 'What is this about?'
}
],
providerOptions: {
google: {
cachedContent: 'cachedContents/12345abcde' // Pre-created cache ID
}
}
});
Key points:
- Google's caching works differently: you must create cached content via Google's API first
- Then reference it with the
cachedContentparameter (format:cachedContents/{id}) - This is more explicit than Anthropic/OpenAI but provides more control
- Token usage tracking includes
cachedContentTokenCountin the response
###4. xAI Models (Grok models)
xAI uses the same approach as OpenAI (via OpenAI-compatible interface):
import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
import { generateText } from 'ai';
const xai = createOpenAICompatible({
baseURL: 'https://api.x.ai/v1',
apiKey: process.env.XAI_API_KEY,
name: 'xai'
});
await generateText({
model: xai.chatModel('grok-4'),
messages: [
{
role: 'system',
content: 'Your large prompt content here...'
},
{
role: 'user',
content: 'What is this about?'
}
],
providerOptions: {
openaiCompatible: {
// xAI supports cache_control via OpenAI compatibility
cache_control: {
type: 'ephemeral'
}
}
}
});
Key points:
- xAI follows the OpenAI-compatible interface
- Uses
cache_controlsimilar to Anthropic but through the OpenAI compatibility layer - Supports the same
prompt_cache_hit_tokenstracking as OpenAI
###Monitoring Cache Usage
All providers return cache token information in the response:
const result = await generateText({...});
// Access cache metrics:
console.log(result.usage);
// {
// inputTokens: {
// total: 1000,
// noCache: 500, // Tokens not from cache
// cacheRead: 400, // Tokens read from cache
// cacheWrite: 100 // Tokens written to cache
// },
// outputTokens: {...}
// }
###Example with All Providers
Here's a template supporting multiple providers:
import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
import { google } from '@ai-sdk/google';
const largeContext = "... 1000+ tokens of content ...";
async function queryWithCache(provider: string) {
const modelConfig = {
anthropic: {
model: anthropic('claude-3-5-sonnet-20241022'),
providerOptions: {
anthropic: { cacheControl: { type: 'ephemeral' } }
}
},
openai: {
model: openai('gpt-4o'),
providerOptions: {
openai: { promptCacheRetention: '24h' }
}
},
google: {
model: google('gemini-1.5-pro'),
providerOptions: {
google: { cachedContent: 'cachedContents/...' } // pre-created
}
}
}[provider];
return generateText({
...modelConfig,
messages: [
{ role: 'system', content: largeContext },
{ role: 'user', content: 'Question about the context?' }
]
});
}
The key difference is that Anthropic and xAI use explicit cache markers on content, while OpenAI uses automatic caching with optional retention policies, and Google requires pre-created cache entries.
END OF POST