What makes vinext-agents-example so fast?
how's this so fast?
Now I have a comprehensive picture. Here's what makes this app feel so fast β it's a combination of multiple architectural speed wins stacking on top of each other:
##ποΈ Why it's so fast β a layer-by-layer breakdown
###1. Vite instead of Next.js's compiler (dev speed)
The core of the speed story starts at vite.config.ts. Instead of running Next.js's default build toolchain (Turbopack/webpack), this project uses vinext β a reimplementation of the Next.js API surface on top of Vite. Vite's native ESM dev server is significantly faster for cold starts and HMR:
plugins: [
vinext(),
cloudflare({ viteEnvironment: { name: "rsc", childEnvironments: ["ssr"] } }),
]
According to vinext's own benchmarks, dev server cold starts are ~60% faster than Next.js 16 with Turbopack, and production builds are ~50% faster β and that's before Vite 8/Rolldown which is even faster.
###2. Everything runs in one Worker β zero network hops
Look at worker/index.ts β the entire application (pages, API routes, agent WebSocket handling, image optimization) runs in a single Cloudflare Worker:
export default {
async fetch(request: Request, env: Env): Promise<Response> {
// Image optimization? Same worker.
// Agent WebSocket? Same worker.
const agentResponse = await routeAgentRequest(request, env);
if (agentResponse) return agentResponse;
// Next.js pages? Same worker.
return handler.fetch(request);
},
};
There are no separate API servers, no microservices, no cold-starting Lambda functions. The agent, the AI model invocation, and the page rendering are all in-process on the same edge node closest to the user.
###3. Cloudflare Workers = zero cold starts at the edge
Unlike serverless functions on other platforms that spin up containers, Cloudflare Workers use V8 isolates that start in under 5ms. Combined with global edge deployment (300+ locations), there's practically no cold start latency. The wrangler.jsonc config shows this is a pure Workers deployment:
"compatibility_date": "2026-02-12",
"compatibility_flags": ["nodejs_compat"],
###4. WebSocket connection for chat β no HTTP polling
The chat UI in app/chat/Chat.tsx uses useAgent and useAgentChat which establish a persistent WebSocket to the Durable Object:
const agent = useAgent({
agent: "ChatAgent",
name: roomId,
onOpen: useCallback(() => setConnected(true), []),
...
});
Messages stream in real-time over the WebSocket instead of HTTP request/response cycles. No polling, no SSE reconnection overhead.
###5. Durable Object with in-memory + SQLite state
The ChatAgent extends AIChatAgent, which is a Durable Object. It has co-located compute and storage β chat history is persisted in Durable Object SQLite (configured in wrangler.jsonc), meaning reads/writes to conversation state are local, not network calls to a remote database:
"migrations": [{ "new_sqlite_classes": ["ChatAgent"], "tag": "v1" }]
###6. Workers AI β on-network model inference
The AI model is called via the AI binding, which means the inference runs within Cloudflare's network, not across the internet to a third-party API:
const workersai = createWorkersAI({ binding: this.env.AI });
const result = streamText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
...
});
The model used is GLM-4.7-flash β a fast, lightweight model specifically chosen for speed. And streamText means tokens arrive as they're generated, not after the whole response is complete.
###7. Streaming all the way down
The response is returned as a streaming UI message:
return result.toUIMessageStreamResponse();
Combined with the WebSocket transport and the Streamdown markdown renderer on the client, there's streaming from model β Durable Object β WebSocket β browser rendering. No buffering at any layer.
###8. Client-side: dynamic import, no SSR for the chat
The homepage in app/page.tsx uses dynamic(..., { ssr: false }) so the chat component never blocks the initial page render:
const Chat = dynamic(() => import("./chat/Chat"), { ssr: false });
The shell loads instantly, and the chat component hydrates client-side.
###TL;DR
It's fast because almost every layer is optimized to minimize latency:
| Layer | Technique | Impact |
|---|---|---|
| Build | Vite (not webpack/Turbopack) | ~50-60% faster dev/build |
| Deployment | Cloudflare Workers (V8 isolates) | Zero cold starts, edge-close to user |
| Architecture | Single Worker for everything | No inter-service network hops |
| Transport | WebSocket via Durable Objects | No HTTP polling overhead |
| State | Durable Object SQLite | Co-located storage, no DB roundtrips |
| AI | Workers AI (AI binding) + flash model | On-network inference, fast model |
| Rendering | Streaming (model β WS β browser) | First token appears immediately |
| Client | Dynamic import, no SSR for chat | Instant shell load |
It's the full stack working together β Vite's speed for dev, Cloudflare's edge for deployment, WebSockets for transport, Durable Objects for state, and Workers AI for on-network inference, all streaming end-to-end.
websockets are the key here, huh. is there a way to achieve this kind of speed using something like https://useworkflow.dev/ for streaming?
Now I have a clear picture of both architectures. Here's the comparison:
##Workflow DevKit streaming vs. WebSockets here β can you get the same speed?
Short answer: yes, you can get very close in perceived speed for AI streaming. But the two systems are solving different problems and make different tradeoffs.
###How this repo streams (WebSocket via Durable Objects)
Looking at app/chat/Chat.tsx:
// Persistent WebSocket to a Durable Object
const agent = useAgent({ agent: "ChatAgent", name: roomId });
// Chat messages flow over that same WebSocket
const { messages, sendMessage } = useAgentChat({ agent });
And in worker/chat-agent.ts:
const result = streamText({ model: workersai("@cf/zai-org/glm-4.7-flash"), ... });
return result.toUIMessageStreamResponse();
The flow is: Browser β WebSocket β Durable Object β Workers AI β all within Cloudflare's network. The WebSocket is always open, so there's zero connection overhead per message.
###How Workflow DevKit streams (HTTP streaming with resumability)
Workflow DevKit uses standard HTTP streaming (via ReadableStream responses), not WebSockets. From their docs:
// API route returns a streaming HTTP response
const run = await start(chatWorkflow, [modelMessages]);
return createUIMessageStreamResponse({
stream: run.readable, // ReadableStream, not WebSocket
});
On the client side, it uses AI SDK's useChat with a WorkflowChatTransport:
const { messages } = useChat({
transport: new WorkflowChatTransport({ api: "/api/chat" }),
});
Each user message triggers a new HTTP POST β the server starts a workflow run β returns a streaming response. It's request/response, not a persistent connection.
###The speed comparison
| Factor | This repo (WebSocket + DO) | Workflow DevKit (HTTP streaming) |
|---|---|---|
| Connection setup per message | None β WebSocket is already open | New HTTP request each time (~1 RTT) |
| Time to first token | Minimal β just the LLM latency | HTTP request + workflow start + LLM latency |
| Token streaming speed | Same (both stream token-by-token) | Same (both stream token-by-token) |
| Server push (notifications) | β
DO can broadcast() anytime | β Server can't push β client must poll or reconnect |
| Resumability | Manual (reconnect WebSocket) | Built-in (WorkflowChatTransport auto-resumes with startIndex) |
| Durability | DO SQLite persists chat history | Workflow event log persists all steps + streams |
| Where it runs | Cloudflare edge only | Vercel, self-hosted (Postgres world), local |
###Where you'd actually notice a difference
-
Time-to-first-token: The WebSocket approach here saves ~1 round trip per message because the connection is already established. With Workflow DevKit, each message opens a new HTTP stream. On a fast connection that's maybe 50-100ms difference β noticeable but not dramatic.
-
Server-initiated events: This repo uses
this.broadcast()in the Durable Object for scheduled task notifications (chat-agent.ts:149). Workflow DevKit can't do this β it's pull-based. You'd need a separate mechanism (polling, SSE, or a WebSocket layer on top). -
Once streaming starts, speed is identical: Both approaches stream tokens as they arrive from the LLM. The streaming itself isn't what makes this fast β it's the infrastructure around it (edge deployment, on-network AI, persistent connection).
###Where Workflow DevKit wins instead
- Resumable streams out of the box: If the user refreshes mid-stream, Workflow DevKit picks up exactly where it left off via
startIndex. This repo would lose the in-flight stream. - Durability of the workflow itself: Each tool call is a retriable "step." If
searchFlightsfails, it auto-retries. This repo's tools either succeed or fail in-place. - Portability: Workflow DevKit works on Vercel, self-hosted Postgres, local dev. This repo is locked to Cloudflare's Durable Objects.
###Bottom line
The perceived speed of AI token streaming would be nearly identical with Workflow DevKit. What makes this repo feel so snappy isn't really about WebSocket vs HTTP streaming for the AI part β it's the full stack working together: zero-cold-start Workers + on-network AI inference + persistent WebSocket eliminating per-message overhead + co-located Durable Object state.
If you used Workflow DevKit on Vercel with a fast LLM provider, the streaming experience itself would feel comparable. You'd lose the persistent WebSocket (so slightly higher latency per message, no server push), but you'd gain automatic resumability and step-level durability β which matters more for production reliability than raw speed.
END OF POST