Does pi implement prompt caching in the TUI?

HEAD

2edd6b4Add [Unreleased] section for next cycle

This post might have stale content, as HEAD is 650 commits ahead.

Julian Benegas posted

does pi implement prompt caching? i'm using it via the tui

GPT 5.5 commented

Yes — Pi does implement provider-side prompt caching, and the TUI uses it.

A few source-backed details:

The TUI/agent wires a stable session id into provider requests: sdk.ts passes sessionManager.getSessionId() into the Agent, and agent.ts forwards that as sessionId in each loop config.
Default cache retention is "short"; setting PI_CACHE_RETENTION=long enables longer retention where supported. This is documented in packages/coding-agent/README.md as:
- Anthropic: 1h
- OpenAI: 24h
For Anthropic-style APIs, Pi adds cache_control: { type: "ephemeral" } to the system prompt, last user message, and last tool definition; with long retention it adds ttl: "1h" where supported (packages/ai/src/providers/anthropic.ts).
For OpenAI Responses, Pi sends prompt_cache_key based on the session id, and prompt_cache_retention: "24h" for long retention (packages/ai/src/providers/openai-responses.ts).
For OpenAI Chat Completions, Pi sends prompt_cache_key / prompt_cache_retention for compatible endpoints, and can also apply Anthropic-style cache_control for compatible OpenAI-like providers (packages/ai/src/providers/openai-completions.ts).

For TUI usage, you generally don’t need to do anything for normal short prompt caching. To request extended retention, launch it like:

PI_CACHE_RETENTION=long pi

or:

export PI_CACHE_RETENTION=long
pi

To verify it’s working in the TUI, watch the footer stats:

R... = cache-read tokens
W... = cache-write tokens
CH...% = latest cache hit rate

Those are calculated from assistant usage in packages/coding-agent/src/modes/interactive/components/footer.ts. The /session info view also shows Cache Read / Cache Write when present.

Caveat: this is provider-side caching, not Pi storing prompts locally. Whether you get hits depends on the provider/model, cache retention support, stable session/prompt prefixes, and whether the provider reports cache usage.

Martin Sione commented

Is it append only? what happens with dynamic tool registration

REPLY IN THREAD

GPT 5.5 commented

Mostly, but not strictly.

For a normal TUI conversation, Pi sends a stable sessionId and the visible conversation generally grows by appending new user / assistant / tool-result messages. That is the happy path for prompt caching.

But Pi is not guaranteed append-only at the full provider-payload level. Things that can change the prefix include:

switching model / thinking level
compaction
session tree navigation / branching
extension hooks that modify context or system prompt
dynamic tool registration / active-tool changes
resource reloads that change skills, context files, or system prompt additions

###Dynamic tool registration

Dynamic tools definitely affect the cached prompt.

In AgentSession, extension registerTool() calls runtime.refreshTools(), and that ultimately runs _refreshToolRegistry() in agent-session.ts. That rebuilds:

the tool registry sent to the model,
the active tool list,
the system prompt via setActiveToolsByName() → _rebuildSystemPrompt().

Relevant source path:

packages/coding-agent/src/core/extensions/loader.ts
- registerTool(...) stores the tool and calls runtime.refreshTools()
packages/coding-agent/src/core/agent-session.ts
- _refreshToolRegistry(...)
- setActiveToolsByName(...)
- _rebuildSystemPrompt(...)

So if an extension registers/enables/disables tools after the session has started, the next provider request may have a different tools payload and possibly a different system prompt.

###What does that do to caching?

It usually causes a cache miss or a cache rewrite for the changed prefix.

For Anthropic-compatible providers, Pi applies cache markers to:

system prompt
last tool definition
last user message

Source: packages/ai/src/providers/anthropic.ts, where it adds cache_control to system blocks, the last user message, and the last tool.

For OpenAI-compatible Anthropic-style cache control, Pi similarly applies cache control to:

instruction/system message
last tool
last user/assistant text content

Source: packages/ai/src/providers/openai-completions.ts, applyAnthropicCacheControl(...).

So if your tool list changes, the cached “tools” segment changes. If the selected tool set also changes Pi’s system prompt snippets/guidelines, then the system prompt changes too. Providers won’t reuse a cache entry for content that no longer matches.

###Is this a correctness problem?

No. It should only affect cache hit rate / cost / latency.

Pi is not relying on the cache for correctness. It sends the current full context/tool definitions every request. If the provider can reuse a prefix, great; if not, it recomputes.

###Practical guidance

If you care about cache efficiency:

register extension tools once at startup if possible;
avoid changing active tools every turn;
avoid per-turn system prompt mutations;
prefer stable tool ordering/definitions;
expect W... cache-write spikes and lower CH...% after registering/removing tools.

In the TUI footer, if dynamic tool registration is hurting caching, you’ll typically see more W and less R / lower CH% after the toolset changes.

END OF POST