Fastpaca

Context infra for LLM apps. Fastpaca keeps full history and maintains your LLM context window in one backend service.

Quick Start – ship a context endpoint in minutes
Getting Started – understand how the pieces fit
API Reference – REST & websocket surfaces

View source - view the source code

Why it matters

Users need to see every message.
LLMs can only see a limited context window.

Fastpaca bridges that gap with an append-only history, context compaction, and streaming — all inside one backend service. You stay focused on prompts, tools, UI, and business logic.

(Curious how it’s built? See the architecture.)

How Fastpaca Works

Choose a budget & context policy – every context sets its token budget and compaction policy up front.

const ctx = await fastpaca.context('chat_42', {
  budget: 1_000_000,
  trigger: 0.7,
  policy: { strategy: 'last_n', config: { limit: 400 } }
});

Append from your backend – Any message from your LLMs or your users.

await ctx.append({
  role: 'user',
  parts: [{ type: 'text', text: 'What changed in the latest release?' }]
});

Call your LLM – Fetch the compacted context and hand it to your LLM.

const stream = ctx.stream((messages) => streamText({
  model: openai('gpt-4o-mini'),
  messages
}));

return stream.toResponse();

(optional) Compact on your terms – when the policy is set to manual.

const { needsCompaction, messages } = await ctx.context();
if (needsCompaction) {
  const { summary, remainingMessages } = await summarise(messages);
  await ctx.compact([
    { role: 'system', parts: [{ type: 'text', text: summary }] },
    ...remainingMessages
  ]);
}

Need the mental model? Go to Context Management. Want to hack now? Hit Quick Start.

Why Teams Pick Fastpaca

Stack agnostic – Bring your own framework. Works natively with ai-sdk. Use LangChain, raw OpenAI/Anthropic calls, whatever you fancy.
Horizontally scalable – Distributed consensus, idempotent appends, automatic failover. Scale nodes horizontally without risk.
Token-smart – Enforce token budgets with built-in compaction policies. Stay within limits automatically.
Self-hosted – Single container by default.

Context context state that doesn’t fall over.

What fastpaca is not

Not a vector DB - bring your own to complement your LLM.
Not generic chat infrastructure - built specifically for LLMs.
Not an agent framework - use it alongside whichever one you prefer.

Where to Go Next

Ship the basics: Quick Start
Understand policies: Context Management
Call the API from code: TypeScript SDK & Examples
Learn the internals: Architecture • API Reference

Use ai-sdk for inference. Use Fastpaca for context state. Bring your own LLM, framework, and frontend.

Why it matters​

How Fastpaca Works​

Why Teams Pick Fastpaca​

What fastpaca is not​

Where to Go Next​

Why it matters

How Fastpaca Works

Why Teams Pick Fastpaca

What fastpaca is not

Where to Go Next