How are tokens counted and tracked?
How are tokens counted and tracked?
I'll search for information about how tokens are counted and tracked in the vercel/ai repository.
The search results show extensive usage of token tracking throughout the codebase. Let me look at some core data structures and conversion functions to give you a comprehensive answer.
Let me look at the core API types:
Let me try a different approach:
Now let me look at the higher-level streaming and generation functions:
Let me search for where usage is accumulated:
Perfect! Now let me look at how tokens are tracked in stream-text:
Let me check the stream-text-result file:
Let me try with generate-text:
Great! Now let me look at step-result to understand how individual steps track usage:
Perfect! Now let me look at how embedding token tracking works:
Let me look at the embedding model result:
Excellent! Now I have enough information to provide a comprehensive answer. Let me create a summary:
##Token Counting and Tracking in Vercel AI
Based on the source code, here's how tokens are counted and tracked throughout the system:
###1. Core Token Data Structures
The library defines a normalized token usage type in the provider package:
LanguageModelV3Usage (source: packages/provider/src/language-model/v3/language-model-v3-usage.ts):
export type LanguageModelV3Usage = {
inputTokens: {
total: number | undefined; // Total prompt tokens
noCache: number | undefined; // Non-cached tokens
cacheRead: number | undefined; // Cached tokens read
cacheWrite: number | undefined; // Cached tokens written
};
outputTokens: {
total: number | undefined; // Total completion tokens
text: number | undefined; // Text tokens
reasoning: number | undefined; // Reasoning tokens
};
raw?: JSONObject; // Raw provider data
};
###2. Provider-Specific Conversion
Each provider (OpenAI, Anthropic, Mistral, etc.) returns usage in its own format, which is then normalized to the standard format:
OpenAI Example (packages/openai/src/chat/convert-openai-chat-usage.ts):
export function convertOpenAIChatUsage(usage: OpenAIChatUsage): LanguageModelV3Usage {
const promptTokens = usage.prompt_tokens ?? 0;
const completionTokens = usage.completion_tokens ?? 0;
const cachedTokens = usage.prompt_tokens_details?.cached_tokens ?? 0;
const reasoningTokens = usage.completion_tokens_details?.reasoning_tokens ?? 0;
return {
inputTokens: {
total: promptTokens,
noCache: promptTokens - cachedTokens,
cacheRead: cachedTokens,
cacheWrite: undefined,
},
outputTokens: {
total: completionTokens,
text: completionTokens - reasoningTokens,
reasoning: reasoningTokens,
},
raw: usage,
};
}
Anthropic Example (packages/anthropic/src/convert-anthropic-messages-usage.ts):
export function convertAnthropicMessagesUsage(usage: AnthropicMessagesUsage): LanguageModelV3Usage {
const inputTokens = usage.input_tokens;
const cacheCreationTokens = usage.cache_creation_input_tokens ?? 0;
const cacheReadTokens = usage.cache_read_input_tokens ?? 0;
return {
inputTokens: {
total: inputTokens + cacheCreationTokens + cacheReadTokens,
noCache: inputTokens,
cacheRead: cacheReadTokens,
cacheWrite: cacheCreationTokens,
},
outputTokens: { /* ... */ },
};
}
###3. High-Level Usage Type
The AI SDK converts this to a user-friendly LanguageModelUsage type (packages/ai/src/types/usage.ts):
export type LanguageModelUsage = {
inputTokens: number | undefined;
inputTokenDetails: {
noCacheTokens: number | undefined;
cacheReadTokens: number | undefined;
cacheWriteTokens: number | undefined;
};
outputTokens: number | undefined;
outputTokenDetails: {
textTokens: number | undefined;
reasoningTokens: number | undefined;
};
totalTokens: number | undefined; // Calculated sum
raw?: JSONObject;
};
###4. Accumulating Usage Across Steps
For multi-step generation (with tool calls), tokens are accumulated using the addLanguageModelUsage() function:
export function addLanguageModelUsage(
usage1: LanguageModelUsage,
usage2: LanguageModelUsage,
): LanguageModelUsage {
return {
inputTokens: addTokenCounts(usage1.inputTokens, usage2.inputTokens),
outputTokens: addTokenCounts(usage1.outputTokens, usage2.outputTokens),
inputTokenDetails: {
noCacheTokens: addTokenCounts(/* ... */),
cacheReadTokens: addTokenCounts(/* ... */),
cacheWriteTokens: addTokenCounts(/* ... */),
},
outputTokenDetails: {
textTokens: addTokenCounts(/* ... */),
reasoningTokens: addTokenCounts(/* ... */),
},
totalTokens: addTokenCounts(usage1.totalTokens, usage2.totalTokens),
};
}
###5. Result Objects
Usage is exposed in result objects at multiple levels:
-
StepResult: Per-step token usage (single model call) -
GenerateTextResult:usage: Last step tokenstotalUsage: Cumulative tokens across all steps
-
EmbeddingModelV3Result: For embeddings, just{ tokens: number }
###6. Tracking Different Token Types
The system tracks:
- Input tokens: Prompt/cached/non-cached breakdown
- Output tokens: Text vs. reasoning token separation
- Cache metrics: Tokens read from cache vs. newly cached
- Raw provider data: Original response preserved for transparency
This layered approach allows the SDK to normalize diverse provider APIs while maintaining full fidelity with provider-specific token breakdowns.
END OF POST
Claude Haiku 4.5