TypeScript SDK
These helpers mirror the REST API shown in the Quick Start / Getting Started guides. They accept plain messages with role and parts (each part has a type string). This shape is fully compatible with ai-sdk v5 UIMessage, but the SDK does not depend on ai-sdk.
npm install @fastpaca/fastpaca
1. Create (or load) a context
import { createClient } from '@fastpaca/fastpaca';
const fastpaca = createClient({
baseUrl: process.env.FASTPACA_URL ?? 'http://localhost:4000/v1',
apiKey: process.env.FASTPACA_API_KEY // optional
});
// Idempotent create/update when options are provided
const ctx = await fastpaca.context('123456', {
budget: 1_000_000, // input token budget for this context
trigger: 0.7, // optional trigger ratio (defaults to 0.7)
policy: { strategy: 'last_n', config: { limit: 400 } }
});
context(id) never creates IDs for you; you decide what to use so you can continue the same context later.
2. Append messages
await ctx.append({
role: 'assistant',
parts: [
{ type: 'text', text: 'I can help with that.' },
{ type: 'tool_call', name: 'lookup_manual', payload: { article: 'installing' } }
],
metadata: { reasoning: 'User asked for deployment steps.' }
});
// Optionally pass known token count for accuracy
await ctx.append({
role: 'assistant',
parts: [{ type: 'text', text: 'OK!' }]
}, { tokenCount: 12 });
Messages are stored exactly as you send them and receive a deterministic seq for ordering.
3. Build the LLM context and call your model
const { used_tokens, messages, needs_compaction } = await ctx.context();
const { text } = await generateText({
model: openai('gpt-4o-mini'),
messages
});
await ctx.append({
role: 'assistant',
parts: [{ type: 'text', text }]
});
needs_compaction is a hint; ignore it unless you’ve opted to handle compaction yourself.
4. Stream responses
// Fetch context messages
const { messages } = await ctx.context();
// Stream response with ai-sdk and append in onFinish
return streamText({
model: openai('gpt-4o-mini'),
messages,
}).toUIMessageStreamResponse({
onFinish: async ({ responseMessage }) => {
await ctx.append(responseMessage);
},
});
The onFinish callback receives { responseMessage } in the ai-sdk v5 message shape. Pass it directly to ctx.append() to persist to context.
5. Fetch messages for your UI
const latest = await ctx.getTail({ offset: 0, limit: 50 }); // last ~50 messages
const previous = await ctx.getTail({ offset: 50, limit: 50 }); // next page back in time
6. Optional: manage compaction yourself
const { needs_compaction, messages } = await ctx.context();
if (needs_compaction) {
const { summary, remainingMessages } = await summarise(messages);
await ctx.compact([
{ role: 'system', parts: [{ type: 'text', text: summary }] },
...remainingMessages
]);
}
This rewrites only what the LLM will see. Users still get the full message log.
Error handling
- Append conflicts return
409 Conflictwhen you passifVersionand the context version changed (optimistic concurrency control). On 409, read the current context version and retry with the updated version. - Network retries: On timeout or 5xx errors, retry the same request (version unchanged). On
409 Conflict, read context to get updated version, then retry. - Streaming propagates LLM errors directly; Fastpaca only appends once the stream succeeds.
Notes:
- The server computes message token counts by default; pass
tokenCountwhen you have an accurate value (e.g., from your model provider). - Use
ctx.context({ budgetTokens: ... })to temporarily override the input budget for a single call.
Token usage with ai-sdk v5:
streamTextandgenerateTextexposeusage/totalUsage. If your provider returns completion token counts, pass that as{ tokenCount }when appending the assistant’s response for maximum accuracy.
See the REST API reference for exact payloads.