Skip to main content

REST API

All endpoints live under /v1. Requests must include Content-Type: application/json where applicable. Responses are JSON unless stated otherwise.

Contexts

PUT /v1/contexts/:id

Create or update a context. Idempotent (PUT).

Request body
{
"token_budget": 1000000,
"trigger_ratio": 0.7,
"policy": {
"strategy": "last_n",
"config": { "limit": 400 }
},
"metadata": {
"project": "support"
}
}
Response
{
"id": "support-123",
"token_budget": 1000000,
"policy": { ... },
"version": 0,
"created_at": "2025-01-24T12:00:00Z",
"updated_at": "2025-01-24T12:00:00Z"
}

trigger_ratio defaults to 0.7 if omitted.

GET /v1/contexts/:id

Returns the current context configuration and metadata.

DELETE /v1/contexts/:id

Tombstones the context. Existing messages remain available for replay, but new messages are rejected.

Messages

POST /v1/contexts/:id/messages

Append a message.

Request body
{
"message": {
"role": "assistant",
"parts": [
{ "type": "text", "text": "Checking now…" },
{ "type": "tool_call", "name": "lookup", "payload": { "sku": "A-19" } }
],
"token_count": 128,
"metadata": { "reasoning": "User asked for availability." }
},
"idempotency_key": "msg-045",
"if_version": 41
}
Response
{ "seq": 42, "version": 42, "token_estimate": 128 }
  • token_count (optional): when provided, the server uses it verbatim. If omitted, the server computes an approximate value.
  • idempotency_key is optional but recommended for retries.
  • if_version enforces optimistic concurrency. The server returns 409 Conflict if the current version does not match.

GET /v1/contexts/:id/tail

Retrieve messages with tail-based pagination (newest to oldest). Designed for backward iteration from recent messages, ideal for infinite scroll, mobile apps, and future tiered storage.

Query parameters:

  • limit (integer, default: 100) – Maximum messages to return
  • offset (integer, default: 0) – Number of messages to skip from tail (0 = most recent)
# Get last 50 messages
GET /v1/contexts/demo/tail?limit=50

# Get next page (messages 51-100 from tail)
GET /v1/contexts/demo/tail?offset=50&limit=50

# Get third page (messages 101-150 from tail)
GET /v1/contexts/demo/tail?offset=100&limit=50

Response:

{
"messages": [
{
"seq": 951,
"role": "user",
"parts": [{ "type": "text", "text": "…" }],
"token_count": 42,
"metadata": {},
"inserted_at": "2025-01-24T12:00:00Z"
},
{
"seq": 952,
"role": "assistant",
"parts": [{ "type": "text", "text": "…" }],
"token_count": 128,
"metadata": {},
"inserted_at": "2025-01-24T12:00:15Z"
}
]
}

Messages are returned in chronological order (oldest to newest in the result). Empty array indicates you've reached the beginning of history.

Pagination pattern:

let offset = 0;
const limit = 100;

while (true) {
const { messages } = await fetch(
`/v1/contexts/${id}/tail?offset=${offset}&limit=${limit}`
).then(r => r.json());

if (messages.length === 0) break; // Reached beginning

displayMessages(messages);
offset += messages.length;
}

LLM context

GET /v1/contexts/:id/context

Returns the current LLM context (the slice you send to your LLM).

Query parameters:

  • budget_tokens (optional) – temporarily override the configured budget.
  • if_version (optional) – fail with 409 if the snapshot changed since the supplied version.
Response
{
"version": 84,
"messages": [...],
"used_tokens": 702134,
"needs_compaction": true,
"segments": [
{ "type": "summary", "from_seq": 1, "to_seq": 340 },
{ "type": "live", "from_seq": 341, "to_seq": 384 }
]
}

Compaction

POST /v1/contexts/:id/compact

Rewrite the LLM context snapshot in full. The raw message log remains untouched for replay/audit. This is an all-or-nothing replacement of the current LLM window.

Request body
{
"replacement": [
{
"role": "system",
"parts": [
{ "type": "text", "text": "One-paragraph summary of prior context…" }
]
},
{ "role": "user", "parts": [{ "type": "text", "text": "The most recent question." }] }
],
"if_version": 83
}
Response
{ "version": 84 }

Partial ranges are not supported. To preserve a tail, include those messages in replacement.

Context metadata

PATCH /v1/contexts/:id/metadata

Upsert custom metadata associated with the context. Metadata is stored alongside the snapshot and returned by GET /v1/contexts/:id.

{ "metadata": { "customer": "acme-corp", "priority": "gold" } }

Health endpoints

  • GET /health/live – returns {"status":"ok"} when the node is accepting traffic.
  • GET /health/ready – returns {"status":"ok"} when the node has joined the cluster and can serve requests.

Error codes

StatusMeaningNotes
400Invalid payloadSchema or validation failure
401UnauthorizedMissing/invalid API key (when enabled)
404Not foundContext does not exist
409ConflictVersion guard failed or context tombstoned
429Rate limitedPer-node rate limiting (configurable)
500Internal errorUnexpected server failure
503UnavailableNo Raft quorum available (retry with backoff)

Errors follow a consistent shape:

{
"error": "conflict",
"message": "Context version changed (expected 83, found 84)"
}