Fastpaca
Context infra for LLM apps. Fastpaca keeps full history and maintains your LLM context window in one backend service.
- Quick Start – ship a context endpoint in minutes
- Getting Started – understand how the pieces fit
- API Reference – REST & websocket surfaces
- View source - view the source code
Why it matters
- Users need to see every message.
- LLMs can only see a limited context window.
Fastpaca bridges that gap with an append-only history, context compaction, and streaming — all inside one backend service. You stay focused on prompts, tools, UI, and business logic.
(Curious how it’s built? See the architecture.)
How Fastpaca Works
- Choose a budget & context policy – every context sets its token budget and compaction policy up front.
const ctx = await fastpaca.context('chat_42', {
budget: 1_000_000,
trigger: 0.7,
policy: { strategy: 'last_n', config: { limit: 400 } }
}); - Append from your backend – Any message from your LLMs or your users.
await ctx.append({
role: 'user',
parts: [{ type: 'text', text: 'What changed in the latest release?' }]
}); - Call your LLM – Fetch the compacted context and hand it to your LLM.
const stream = ctx.stream((messages) => streamText({
model: openai('gpt-4o-mini'),
messages
}));
return stream.toResponse(); - (optional) Compact on your terms – when the policy is set to
manual.const { needsCompaction, messages } = await ctx.context();
if (needsCompaction) {
const { summary, remainingMessages } = await summarise(messages);
await ctx.compact([
{ role: 'system', parts: [{ type: 'text', text: summary }] },
...remainingMessages
]);
}
Need the mental model? Go to Context Management. Want to hack now? Hit Quick Start.
Why Teams Pick Fastpaca
- Stack agnostic – Bring your own framework. Works natively with ai-sdk. Use LangChain, raw OpenAI/Anthropic calls, whatever you fancy.
- Horizontally scalable – Distributed consensus, idempotent appends, automatic failover. Scale nodes horizontally without risk.
- Token-smart – Enforce token budgets with built-in compaction policies. Stay within limits automatically.
- Self-hosted – Single container by default.
Context context state that doesn’t fall over.
What fastpaca is not
- Not a vector DB - bring your own to complement your LLM.
- Not generic chat infrastructure - built specifically for LLMs.
- Not an agent framework - use it alongside whichever one you prefer.
Where to Go Next
- Ship the basics: Quick Start
- Understand policies: Context Management
- Call the API from code: TypeScript SDK & Examples
- Learn the internals: Architecture • API Reference
Use ai-sdk for inference. Use Fastpaca for context state. Bring your own LLM, framework, and frontend.