Skip to main content

Fastpaca

Context infra for LLM apps. Fastpaca keeps full history and maintains your LLM context window in one backend service.


Why it matters

  • Users need to see every message.
  • LLMs can only see a limited context window.

Fastpaca bridges that gap with an append-only history, context compaction, and streaming — all inside one backend service. You stay focused on prompts, tools, UI, and business logic.

(Curious how it’s built? See the architecture.)


How Fastpaca Works

  1. Choose a budget & context policy – every context sets its token budget and compaction policy up front.
    const ctx = await fastpaca.context('chat_42', {
    budget: 1_000_000,
    trigger: 0.7,
    policy: { strategy: 'last_n', config: { limit: 400 } }
    });
  2. Append from your backend – Any message from your LLMs or your users.
    await ctx.append({
    role: 'user',
    parts: [{ type: 'text', text: 'What changed in the latest release?' }]
    });
  3. Call your LLM – Fetch the compacted context and hand it to your LLM.
    const stream = ctx.stream((messages) => streamText({
    model: openai('gpt-4o-mini'),
    messages
    }));

    return stream.toResponse();
  4. (optional) Compact on your terms – when the policy is set to manual.
    const { needsCompaction, messages } = await ctx.context();
    if (needsCompaction) {
    const { summary, remainingMessages } = await summarise(messages);
    await ctx.compact([
    { role: 'system', parts: [{ type: 'text', text: summary }] },
    ...remainingMessages
    ]);
    }

Need the mental model? Go to Context Management. Want to hack now? Hit Quick Start.


Why Teams Pick Fastpaca

  • Stack agnostic – Bring your own framework. Works natively with ai-sdk. Use LangChain, raw OpenAI/Anthropic calls, whatever you fancy.
  • Horizontally scalable – Distributed consensus, idempotent appends, automatic failover. Scale nodes horizontally without risk.
  • Token-smart – Enforce token budgets with built-in compaction policies. Stay within limits automatically.
  • Self-hosted – Single container by default.

Context context state that doesn’t fall over.


What fastpaca is not

  • Not a vector DB - bring your own to complement your LLM.
  • Not generic chat infrastructure - built specifically for LLMs.
  • Not an agent framework - use it alongside whichever one you prefer.

Where to Go Next

Use ai-sdk for inference. Use Fastpaca for context state. Bring your own LLM, framework, and frontend.