How is the agent (ghost) implemented in the stack?

78d0764feat: pin repos (#62)

This post might have stale content, as HEAD is 229 commits ahead.

how's the agent (ghost) implemented? in terms of the stack. and specifically, how does it handle the "navigate" behavior

Claude Opus 4.6 commented

Here's a full breakdown of how the Ghost agent is implemented and how navigation works:

##Ghost Architecture Overview

Ghost is an AI agent embedded as a side panel in this GitHub client app (a Next.js app). The stack:

Layer	Tech
LLM Gateway	OpenRouter via `@openrouter/ai-sdk-provider` — defaults to `moonshotai/kimi-k2.5` (or `google/gemini-2.5-pro-preview` for merge conflicts). Users can bring their own API key.
AI Framework	Vercel AI SDK v6 (`ai@^6.0.97`) — `streamText()` on the server, `useChat()` on the client.
Server Route	A single Next.js API route at `/api/ai/ghost/route.ts` (~3,400 lines), which is the brain.
Client UI	`ai-chat.tsx` (the chat component), `global-chat-provider.tsx` (React context for global state), `global-chat-panel.tsx` (the sliding panel), `floating-ghost-button.tsx` (the trigger button).
Persistence	Chat history persisted to a DB via `chat-store.ts` with resumable streams. Tab state persisted via `/api/ai/ghost-tabs`.
Memory	Long-term memory via Supermemory (`saveMemory`/`recallMemory` tools). Past memories are auto-recalled into the system prompt.
Semantic Search	Embeddings via Mixedbread for searching previously viewed PRs/issues.
Sandbox	E2B cloud sandboxes for running code, tests, builds, and previews.
GitHub API	Via Octokit, authenticated through the user's session.

##How the Agent Works

The route handler (POST in route.ts) does:

Authenticates the user and gets an Octokit client.
Detects context mode — PR, Issue, or General — based on explicit context from the client or by auto-parsing pageContext.pathname.
Assembles tools by merging several tool groups: generalTools, codeEditTools, sandboxTools, searchTools, memoryTools, plus context-specific tools (prTools, issueTools, mergeConflictTools).
Builds a system prompt that varies by mode (PR, Issue, General), including the current page context, inline code selections, and recalled memories.
Wraps all tools with withSafeTools() — a try/catch wrapper around every tool's execute so a single tool failure doesn't crash the stream.
Calls streamText() with the resolved model, system prompt, messages, tools, up to 50 agentic steps (stopWhen: stepCountIs(50)), and streams the response back.

##How "Navigate" Works — The `_clientAction` Pattern

Navigation is implemented as a server-returns-instruction, client-executes pattern. It's a two-phase process:

###Phase 1: Server-side (tool definitions)

There are 9 navigation tools defined in getGeneralTools(), all following the same pattern — they do no server-side work, just return a marker object:

// apps/web/src/app/api/ai/ghost/route.ts, line 731
navigateTo: tool({
    description: "Navigate the user to a top-level page within the app...",
    inputSchema: z.object({
        page: z.enum(["dashboard", "repos", "prs", "issues", 
                       "notifications", "settings", "search", "trending", "orgs"]),
        description: z.string(),
    }),
    execute: async (input) => ({
        _clientAction: "navigate" as const,   // ← the marker
        ...input,
    }),
}),

The other navigation tools (openRepo, openRepoTab, openWorkflowRun, openCommit, openIssue, openPullRequest, openUser, openUrl) work identically — each returns { _clientAction: "<actionName>", ...params }.

The key insight: these tools have no side effects on the server. Their execute functions simply echo back the input with a _clientAction tag. The LLM decides which tool to call and with what parameters; the actual routing happens on the client.

###Phase 2: Client-side (effect in `ai-chat.tsx`)

In ai-chat.tsx (line ~974), a useEffect watches the messages array. When a new assistant message contains a tool output with _clientAction, it dispatches the navigation:

// apps/web/src/components/shared/ai-chat.tsx, line 1005
if (output._clientAction) {
    executedActionsRef.current.add(actionKey);  // prevent re-execution
    const action = output._clientAction as string;

    setTimeout(() => {  // 600ms delay for UX smoothness
        if (action === "refreshPage") {
            router.refresh();
        } else if (action === "navigate") {
            const pageMap = {
                dashboard: "/dashboard", repos: "/repos", prs: "/prs",
                issues: "/issues", notifications: "/notifications",
                settings: "/settings", search: "/search",
                trending: "/trending", orgs: "/orgs",
            };
            router.push(pageMap[output.page] ?? "/dashboard");
        } else if (action === "openRepo") {
            router.push(`/${output.owner}/${output.repo}`);
        } else if (action === "openRepoTab") {
            router.push(`/${output.owner}/${output.repo}/${output.tab}`);
        } else if (action === "openIssue") {
            router.push(`/${output.owner}/${output.repo}/issues/${output.issueNumber}`);
        } else if (action === "openPullRequest") {
            router.push(`/${output.owner}/${output.repo}/pulls/${output.pullNumber}`);
        } else if (action === "openUser") {
            router.push(`/users/${output.username}`);
        } else if (action === "openUrl") {
            window.open(output.url, "_blank");
        }
        // ... etc for openCommit, openWorkflowRun
    }, 600);
}

It uses Next.js's useRouter().push() for in-app navigation and window.open() for external URLs. An executedActionsRef (a Set) keyed by ${messageId}-${toolCallId} prevents the same navigation from firing twice as the messages array re-renders.

###Why this design?

The system prompt explicitly instructs: "ALWAYS navigate within the app — use navigation tools, not github.com links." This ensures the LLM uses in-app routing instead of generating raw URLs. The _clientAction convention is a clean way to bridge the server-side tool execution (where the LLM decides what to do) with client-side effects (where router.push actually works), without needing a separate RPC layer.

Julian Benegas commented

clever

Claude Opus 4.6 commented

Yeah, it's a nice pattern. The server-side tools are essentially just structured intents — the LLM picks the right one and fills in the params, but the execute is a pure pass-through. All the real side effects live on the client where router.push actually exists. Keeps things clean and avoids any awkward server→client callback plumbing.

END OF POST