Skip to main content

Chat Completions

OpenAI-compatible chat completions endpoint.

POST https://api.ragen.ai/v1/chat/completions
Use the SDK

For TypeScript / JavaScript, prefer the official @ragenai/sdk — typed responses, streaming iterators, automatic retries on 429/5xx:

import { Ragen } from "@ragenai/sdk";

const ragen = new Ragen({ apiKey: process.env.RAGEN_API_KEY });

const completion = await ragen.chat.completions.create({
assistantId: "123e4567-e89b-12d3-a456-426614174000",
messages: [{ role: "user", content: "What is our refund policy?" }],
});

See the Quickstart for streaming, file upload, and Next.js examples. The HTTP reference below documents the underlying wire format used by the SDK and any custom integrations.

Authentication

Authorization: Bearer YOUR_API_KEY

API keys are scoped to your organization. Each request must specify which assistant (project) to query via the assistant_id field in the request body. Create and manage keys under Settings → API Keys in the dashboard.

Finding your assistant ID

You can find your assistant ID in:

  • The dashboard URL when viewing a project: app.ragen.ai/.../projects/<assistant_id>
  • The API: GET /v1/assistants returns a list with each assistant's id
  • Settings → Assistant settings in the dashboard
Debug mode

Enable debug mode on your API key to save every API conversation as a thread visible in the project's API threads tab. Useful during development — you can inspect the full request and response without adding logging to your code. See Debug mode.

Request

Body

FieldTypeRequiredDescription
assistant_idstringYesThe assistant (project) ID to query. Get available IDs from GET /v1/assistants.
messagesarrayYes1–100 messages in conversation order. See Message format.
modelstringNoAccepted for OpenAI SDK compatibility but ignored. Ragen always uses the model configured by the deployment administrator.
temperaturenumberNo0–2. Overrides the organization default.
max_tokensintegerNoCap on generated tokens (1–32,000).
streambooleanNoWhen true, responds with a Server-Sent Events stream. Default false.
stream_optionsobjectNoStreaming options. Currently supports include_usage: boolean. See Streaming.

Message format

Each message is { role, content }:

RolePurpose
userThe user's turn. The last user message is treated as the active question; earlier ones become conversation history.
assistantPrior assistant responses. Included in conversation history.
systemPer-request system instructions — merged on top of the project's own instructions (project owner's intent first, then the caller's).

Response

Non-streaming

Returns a chat.completion object:

{
"id": "chatcmpl-7a2b4c...",
"object": "chat.completion",
"created": 1744664400,
"model": "gpt-5.4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Our refund policy allows returns within 30 days..."
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 128,
"completion_tokens": 42,
"total_tokens": 170
}
}

Streaming

When stream: true, returns text/event-stream with a sequence of chat.completion.chunk objects, terminated by data: [DONE]:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1744664400,"model":"gpt-5.4","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null,"logprobs":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1744664400,"model":"gpt-5.4","choices":[{"index":0,"delta":{"content":"Our "},"finish_reason":null,"logprobs":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1744664400,"model":"gpt-5.4","choices":[{"index":0,"delta":{"content":"refund policy..."},"finish_reason":null,"logprobs":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1744664400,"model":"gpt-5.4","choices":[{"index":0,"delta":{},"finish_reason":"stop","logprobs":null}]}

data: [DONE]
  • The first chunk carries delta.role: "assistant" (OpenAI convention).
  • Content chunks carry delta.content.
  • The final chunk before [DONE] has empty delta and finish_reason: "stop".

Including usage in streams

By OpenAI convention, usage is not emitted on streaming responses unless the caller opts in. Pass stream_options: { include_usage: true } to receive a trailing usage chunk:

{
"id": "chatcmpl-...",
"object": "chat.completion.chunk",
"created": 1744664400,
"model": "gpt-5.4",
"choices": [],
"usage": {
"prompt_tokens": 128,
"completion_tokens": 42,
"total_tokens": 170
}
}

The choices: [] signals that this chunk carries usage only, not content.

Examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
base_url="https://api.ragen.ai/v1",
api_key="YOUR_API_KEY",
)

resp = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": "What is our refund policy?"},
],
extra_body={"assistant_id": "YOUR_ASSISTANT_ID"},
)
print(resp.choices[0].message.content)

Python streaming

stream = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Summarize our onboarding process"}],
stream=True,
stream_options={"include_usage": True},
extra_body={"assistant_id": "YOUR_ASSISTANT_ID"},
)

for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
if chunk.usage:
print(f"\n\nUsed {chunk.usage.total_tokens} tokens")

JavaScript / TypeScript (openai-node)

import OpenAI from "openai";

const client = new OpenAI({
baseURL: "https://api.ragen.ai/v1",
apiKey: process.env.RAGEN_API_KEY,
});

const resp = await client.chat.completions.create({
model: "gpt-5.4",
messages: [{ role: "user", content: "What is our refund policy?" }],
// @ts-expect-error — Ragen extension
assistant_id: "YOUR_ASSISTANT_ID",
});
console.log(resp.choices[0].message.content);

cURL

curl -X POST https://api.ragen.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"assistant_id": "YOUR_ASSISTANT_ID",
"model": "gpt-5.4",
"messages": [
{"role": "user", "content": "What is our refund policy?"}
]
}'

Multi-turn conversation

The last user message is treated as the active question; earlier messages form the conversation context.

resp = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": "What is our refund policy?"},
{"role": "assistant", "content": "We offer refunds within 30 days of purchase."},
{"role": "user", "content": "What about digital products?"},
],
extra_body={"assistant_id": "YOUR_ASSISTANT_ID"},
)

Per-request system prompt

resp = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "system", "content": "Respond only in Polish."},
{"role": "user", "content": "What is our refund policy?"},
],
extra_body={"assistant_id": "YOUR_ASSISTANT_ID"},
)

System messages are merged on top of the project's own instructions — the project owner's instructions come first, then the caller's.

Error responses

Errors use the OpenAI error envelope so existing SDK error-handling keeps working:

{
"error": {
"message": "prompt too long",
"type": "invalid_request_error",
"code": "context_length_exceeded",
"param": null
}
}
StatustypeMeaning
400invalid_request_errorMalformed body, validation failure, prompt too long
401authentication_errorMissing or invalid API key
403permission_errorKey deactivated or missing required scope
404not_found_errorProject (assistant) not found
429rate_limit_errorRate limit exceeded
5xxapi_errorUpstream / internal error

Rate limits

Chat completions run a full RAG pipeline (vector search + rerank + LLM call), so they're on the expensive tier:

ScopeLimit
POST /v1/chat/completions10 req/min per IP

Each streaming and non-streaming request counts identically against this budget. When the limit is hit the response is 429 rate_limit_error; back off with jitter.

Differences vs. /v1/chat

The ragen-native POST /v1/chat endpoint is a simpler interface (single content string + optional context) that predates the OpenAI-compatible one. Both are supported:

/v1/chat/v1/chat/completions
FormatRagen-native JSON / SSEOpenAI wire format
Multi-turnNo (single prompt)Yes (messages array)
Model overrideNoYes (model field)
Temperature overrideNoYes
max_tokensNoYes
Page context injectionYes (context field)No — use messages instead
Usage in streamsNoOpt-in via stream_options

Use /v1/chat/completions for any new integration that benefits from the OpenAI SDK ecosystem. Keep /v1/chat for the embed widget and other existing ragen-native consumers.