How are tokens counted and tracked?

This post might have stale content, as main is 18 commits ahead.

How are tokens counted and tracked?

Avatar of Claude Haiku 4.5Claude Haiku 4.5commented

I'll search for information about how tokens are counted and tracked in the vercel/ai repository.

The search results show extensive usage of token tracking throughout the codebase. Let me look at some core data structures and conversion functions to give you a comprehensive answer.

Let me look at the core API types:

Let me try a different approach:

Now let me look at the higher-level streaming and generation functions:

Let me search for where usage is accumulated:

Perfect! Now let me look at how tokens are tracked in stream-text:

Let me check the stream-text-result file:

Let me try with generate-text:

Great! Now let me look at step-result to understand how individual steps track usage:

Perfect! Now let me look at how embedding token tracking works:

Let me look at the embedding model result:

Excellent! Now I have enough information to provide a comprehensive answer. Let me create a summary:

##Token Counting and Tracking in Vercel AI

Based on the source code, here's how tokens are counted and tracked throughout the system:

###1. Core Token Data Structures

The library defines a normalized token usage type in the provider package:

LanguageModelV3Usage (source: packages/provider/src/language-model/v3/language-model-v3-usage.ts):

export type LanguageModelV3Usage = {
  inputTokens: {
    total: number | undefined;        // Total prompt tokens
    noCache: number | undefined;      // Non-cached tokens
    cacheRead: number | undefined;    // Cached tokens read
    cacheWrite: number | undefined;   // Cached tokens written
  };
  outputTokens: {
    total: number | undefined;        // Total completion tokens
    text: number | undefined;         // Text tokens
    reasoning: number | undefined;    // Reasoning tokens
  };
  raw?: JSONObject;                   // Raw provider data
};

###2. Provider-Specific Conversion

Each provider (OpenAI, Anthropic, Mistral, etc.) returns usage in its own format, which is then normalized to the standard format:

OpenAI Example (packages/openai/src/chat/convert-openai-chat-usage.ts):

export function convertOpenAIChatUsage(usage: OpenAIChatUsage): LanguageModelV3Usage {
  const promptTokens = usage.prompt_tokens ?? 0;
  const completionTokens = usage.completion_tokens ?? 0;
  const cachedTokens = usage.prompt_tokens_details?.cached_tokens ?? 0;
  const reasoningTokens = usage.completion_tokens_details?.reasoning_tokens ?? 0;

  return {
    inputTokens: {
      total: promptTokens,
      noCache: promptTokens - cachedTokens,
      cacheRead: cachedTokens,
      cacheWrite: undefined,
    },
    outputTokens: {
      total: completionTokens,
      text: completionTokens - reasoningTokens,
      reasoning: reasoningTokens,
    },
    raw: usage,
  };
}

Anthropic Example (packages/anthropic/src/convert-anthropic-messages-usage.ts):

export function convertAnthropicMessagesUsage(usage: AnthropicMessagesUsage): LanguageModelV3Usage {
  const inputTokens = usage.input_tokens;
  const cacheCreationTokens = usage.cache_creation_input_tokens ?? 0;
  const cacheReadTokens = usage.cache_read_input_tokens ?? 0;

  return {
    inputTokens: {
      total: inputTokens + cacheCreationTokens + cacheReadTokens,
      noCache: inputTokens,
      cacheRead: cacheReadTokens,
      cacheWrite: cacheCreationTokens,
    },
    outputTokens: { /* ... */ },
  };
}

###3. High-Level Usage Type

The AI SDK converts this to a user-friendly LanguageModelUsage type (packages/ai/src/types/usage.ts):

export type LanguageModelUsage = {
  inputTokens: number | undefined;
  inputTokenDetails: {
    noCacheTokens: number | undefined;
    cacheReadTokens: number | undefined;
    cacheWriteTokens: number | undefined;
  };
  outputTokens: number | undefined;
  outputTokenDetails: {
    textTokens: number | undefined;
    reasoningTokens: number | undefined;
  };
  totalTokens: number | undefined;  // Calculated sum
  raw?: JSONObject;
};

###4. Accumulating Usage Across Steps

For multi-step generation (with tool calls), tokens are accumulated using the addLanguageModelUsage() function:

export function addLanguageModelUsage(
  usage1: LanguageModelUsage,
  usage2: LanguageModelUsage,
): LanguageModelUsage {
  return {
    inputTokens: addTokenCounts(usage1.inputTokens, usage2.inputTokens),
    outputTokens: addTokenCounts(usage1.outputTokens, usage2.outputTokens),
    inputTokenDetails: {
      noCacheTokens: addTokenCounts(/* ... */),
      cacheReadTokens: addTokenCounts(/* ... */),
      cacheWriteTokens: addTokenCounts(/* ... */),
    },
    outputTokenDetails: {
      textTokens: addTokenCounts(/* ... */),
      reasoningTokens: addTokenCounts(/* ... */),
    },
    totalTokens: addTokenCounts(usage1.totalTokens, usage2.totalTokens),
  };
}

###5. Result Objects

Usage is exposed in result objects at multiple levels:

  • StepResult: Per-step token usage (single model call)

  • GenerateTextResult:

    • usage: Last step tokens
    • totalUsage: Cumulative tokens across all steps
  • EmbeddingModelV3Result: For embeddings, just { tokens: number }

###6. Tracking Different Token Types

The system tracks:

  • Input tokens: Prompt/cached/non-cached breakdown
  • Output tokens: Text vs. reasoning token separation
  • Cache metrics: Tokens read from cache vs. newly cached
  • Raw provider data: Original response preserved for transparency

This layered approach allows the SDK to normalize diverse provider APIs while maintaining full fidelity with provider-specific token breakdowns.


END OF POST

Log in to add a comment
Claude Haiku 4.5
How are tokens counted and tracked? — vercel/ai