Does pi implement prompt caching in the TUI?

HEAD
2edd6b4Add [Unreleased] section for next cycle
This post might have stale content, as HEAD is 40 commits ahead.

does pi implement prompt caching? i'm using it via the tui

Avatar of openai/gpt-5.5
GPT 5.5
commented

Yes — Pi does implement provider-side prompt caching, and the TUI uses it.

A few source-backed details:

  • The TUI/agent wires a stable session id into provider requests: sdk.ts passes sessionManager.getSessionId() into the Agent, and agent.ts forwards that as sessionId in each loop config.
  • Default cache retention is "short"; setting PI_CACHE_RETENTION=long enables longer retention where supported. This is documented in packages/coding-agent/README.md as:
    • Anthropic: 1h
    • OpenAI: 24h
  • For Anthropic-style APIs, Pi adds cache_control: { type: "ephemeral" } to the system prompt, last user message, and last tool definition; with long retention it adds ttl: "1h" where supported (packages/ai/src/providers/anthropic.ts).
  • For OpenAI Responses, Pi sends prompt_cache_key based on the session id, and prompt_cache_retention: "24h" for long retention (packages/ai/src/providers/openai-responses.ts).
  • For OpenAI Chat Completions, Pi sends prompt_cache_key / prompt_cache_retention for compatible endpoints, and can also apply Anthropic-style cache_control for compatible OpenAI-like providers (packages/ai/src/providers/openai-completions.ts).

For TUI usage, you generally don’t need to do anything for normal short prompt caching. To request extended retention, launch it like:

PI_CACHE_RETENTION=long pi

or:

export PI_CACHE_RETENTION=long
pi

To verify it’s working in the TUI, watch the footer stats:

  • R... = cache-read tokens
  • W... = cache-write tokens
  • CH...% = latest cache hit rate

Those are calculated from assistant usage in packages/coding-agent/src/modes/interactive/components/footer.ts. The /session info view also shows Cache Read / Cache Write when present.

Caveat: this is provider-side caching, not Pi storing prompts locally. Whether you get hits depends on the provider/model, cache retention support, stable session/prompt prefixes, and whether the provider reports cache usage.

Avatar of martinsione
Martin Sione
commented

Is it append only? what happens with dynamic tool registration


REPLY IN THREAD
Avatar of openai/gpt-5.5
GPT 5.5
commented

Mostly, but not strictly.

For a normal TUI conversation, Pi sends a stable sessionId and the visible conversation generally grows by appending new user / assistant / tool-result messages. That is the happy path for prompt caching.

But Pi is not guaranteed append-only at the full provider-payload level. Things that can change the prefix include:

  • switching model / thinking level
  • compaction
  • session tree navigation / branching
  • extension hooks that modify context or system prompt
  • dynamic tool registration / active-tool changes
  • resource reloads that change skills, context files, or system prompt additions

###Dynamic tool registration

Dynamic tools definitely affect the cached prompt.

In AgentSession, extension registerTool() calls runtime.refreshTools(), and that ultimately runs _refreshToolRegistry() in agent-session.ts. That rebuilds:

  1. the tool registry sent to the model,
  2. the active tool list,
  3. the system prompt via setActiveToolsByName()_rebuildSystemPrompt().

Relevant source path:

  • packages/coding-agent/src/core/extensions/loader.ts
    • registerTool(...) stores the tool and calls runtime.refreshTools()
  • packages/coding-agent/src/core/agent-session.ts
    • _refreshToolRegistry(...)
    • setActiveToolsByName(...)
    • _rebuildSystemPrompt(...)

So if an extension registers/enables/disables tools after the session has started, the next provider request may have a different tools payload and possibly a different system prompt.

###What does that do to caching?

It usually causes a cache miss or a cache rewrite for the changed prefix.

For Anthropic-compatible providers, Pi applies cache markers to:

  • system prompt
  • last tool definition
  • last user message

Source: packages/ai/src/providers/anthropic.ts, where it adds cache_control to system blocks, the last user message, and the last tool.

For OpenAI-compatible Anthropic-style cache control, Pi similarly applies cache control to:

  • instruction/system message
  • last tool
  • last user/assistant text content

Source: packages/ai/src/providers/openai-completions.ts, applyAnthropicCacheControl(...).

So if your tool list changes, the cached “tools” segment changes. If the selected tool set also changes Pi’s system prompt snippets/guidelines, then the system prompt changes too. Providers won’t reuse a cache entry for content that no longer matches.

###Is this a correctness problem?

No. It should only affect cache hit rate / cost / latency.

Pi is not relying on the cache for correctness. It sends the current full context/tool definitions every request. If the provider can reuse a prefix, great; if not, it recomputes.

###Practical guidance

If you care about cache efficiency:

  • register extension tools once at startup if possible;
  • avoid changing active tools every turn;
  • avoid per-turn system prompt mutations;
  • prefer stable tool ordering/definitions;
  • expect W... cache-write spikes and lower CH...% after registering/removing tools.

In the TUI footer, if dynamic tool registration is hurting caching, you’ll typically see more W and less R / lower CH% after the toolset changes.


END OF POST

Does pi implement prompt caching in the TUI? — earendil-works/pi