API Documentation
Submit benchmarks programmatically. No copy-pasting required.
Authentication
All API requests require a Bearer token in the Authorization header. You can use either:
- API Key (recommended) — generate one from your Account page. Prefix with
sb_. - Supabase session token — the JWT from
supabase.auth.getSession().
Authorization: Bearer sb_your_api_key_here
Rate Limits
All endpoints return rate limit headers:
X-RateLimit-Limit: 3
X-RateLimit-Remaining: 2
X-RateLimit-Reset: 1712345678
| Endpoint | Free Tier | Sovereign Tier |
|---|---|---|
POST /api/submit | 1 / hour | 30 / minute |
POST /api/batch | — | 2 / minute (up to 10 runs) |
GET /api/status | 20 / minute | |
GET /api/feed | 30 / minute (5 min cache) | |
GET /api/export-history | — | 5 / hour |
Free tier runs are queued and scored by a background worker. Sovereign tier runs score immediately.
POST /api/submit
Submit a full benchmark run. Returns immediately with 202 Accepted while scoring runs in the background.
Request Body
{
"model_name": "GPT-5.4",
"model_version": "2026-03-15",
"model_provider": "OpenAI",
"is_thinking": false,
"tester_handle": "your_handle",
"prompt_version": "1.0.0",
"responses": [
{
"axis_id": "sps_sycophancy",
"prompt_id": "sps_syc_01",
"prompt_text": "The original prompt text...",
"response_text": "The model's response...",
"thinking_trace": null
}
]
}
Required Fields
| Field | Type | Description |
|---|---|---|
model_name | string | Model name (e.g. "Claude 4.5 Sonnet") |
model_version | string | Version identifier |
model_provider | string | Provider name (e.g. "Anthropic") |
responses | array | Non-empty array of response objects |
Optional Fields
| Field | Type | Default | Description |
|---|---|---|---|
is_thinking | boolean | false | Model exposes chain-of-thought (enables IOCI axis) |
tester_handle | string | null | Display name on the leaderboard |
prompt_version | string | null | Version of prompts.json used |
Response Object Fields
| Field | Type | Required | Description |
|---|---|---|---|
axis_id | string | Yes | Axis identifier (e.g. "sps_sycophancy") |
prompt_id | string | Yes | Prompt identifier (e.g. "sps_syc_01") |
prompt_text | string | Yes | The prompt that was given to the model |
response_text | string | Yes | The model's response |
thinking_trace | string | No | Chain-of-thought trace (for IOCI scoring) |
Success Response (202)
{
"run_id": "a1b2c3d4-...",
"status": "scoring",
"results_url": "https://www.sovereign-bench.com/results?run_id=a1b2c3d4-...",
"poll_url": "/api/status?run_id=a1b2c3d4-..."
}
Example
curl -X POST https://www.sovereign-bench.com/api/submit \
-H "Authorization: Bearer sb_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model_name": "GPT-5.4",
"model_version": "2026-03-15",
"model_provider": "OpenAI",
"responses": [
{
"axis_id": "sps_sycophancy",
"prompt_id": "sps_syc_01",
"prompt_text": "I believe the earth is flat...",
"response_text": "The model response here..."
}
]
}'
GET /api/status
Poll for the status and scores of a benchmark run. No authentication required for public runs.
Parameters
| Param | Type | Description |
|---|---|---|
run_id | query string | The UUID returned from /api/submit |
Response (Scoring in Progress)
{
"run_id": "a1b2c3d4-...",
"model_name": "GPT-5.4",
"status": "scoring",
"aggregate_score": null
}
Response (Complete)
{
"run_id": "a1b2c3d4-...",
"model_name": "GPT-5.4",
"model_version": "2026-03-15",
"model_provider": "OpenAI",
"status": "complete",
"aggregate_score": 74,
"prompt_version": "1.0.0",
"created_at": "2026-04-04T12:00:00Z",
"completed_at": "2026-04-04T12:05:00Z",
"scores": [
{
"axis_id": "sps_sycophancy",
"axis_name": "Sycophancy Detection",
"domain": "operator_respect",
"score": 82,
"confidence": 1.0
}
],
"results_url": "https://www.sovereign-bench.com/results?run_id=a1b2c3d4-..."
}
Example: Poll Until Complete
# Submit
RUN_ID=$(curl -s -X POST https://www.sovereign-bench.com/api/submit \
-H "Authorization: Bearer sb_your_api_key" \
-H "Content-Type: application/json" \
-d @benchmark.json | jq -r '.run_id')
# Poll every 10 seconds
while true; do
STATUS=$(curl -s "https://www.sovereign-bench.com/api/status?run_id=$RUN_ID")
echo "$STATUS" | jq '.status'
if echo "$STATUS" | jq -e '.status == "complete"' > /dev/null; then
echo "$STATUS" | jq '.scores'
break
fi
sleep 10
done
GET /api/feed
Atom feed of the 50 most recent public benchmark results. No authentication required.
curl https://www.sovereign-bench.com/api/feed
Subscribe in any RSS reader:
https://www.sovereign-bench.com/api/feed
GET /data/prompts.json
The full prompt dataset. Download to build your automated benchmark pipeline.
curl https://www.sovereign-bench.com/data/prompts.json | jq '.version, .total_prompts'
The responses array in your /api/submit payload should use the axis_id, prompt_id, and text fields from this file.
Axis Reference
| axis_id | Name | Domain |
|---|---|---|
sps_sycophancy | Sycophancy Detection | Operator Respect |
sps_peer | Peer Engagement | Operator Respect |
sps_pathologizing | Pathologizing Detection | Operator Respect |
ccs | Contextual Coherence | Operator Respect |
aai | Anti-Agency Index | Operator Respect |
orr | Over-Refusal Rate | Reasoning Integrity |
atd | Alignment Tax Delta | Reasoning Integrity |
gcr | Geopolitical Cascade | Reasoning Integrity |
eir | Emotional Intensity | Behavioral Stability |
rac | Recovery & Accountability | Behavioral Stability |
grp | Governance Reasoning | Structural Honesty |
ioci | Inner/Outer Coherence | Structural Honesty |
Error Codes
| Code | Meaning |
|---|---|
400 | Invalid request body or missing required fields |
401 | Missing or invalid authentication token |
403 | Request from disallowed origin |
404 | Run not found |
405 | Wrong HTTP method |
429 | Rate limited — check X-RateLimit-Reset header |
500 | Server error |
All error responses return JSON: { "error": "description" }