Sovereign Bench
Benchmark Leaderboard Methodology Account
◆

Sign in to run the benchmark

You need an account to run Sovereign Bench. Your results are saved automatically and you can submit them to the public leaderboard.

Sign In / Sign Up View Leaderboard

Free accounts can run benchmarks and download results. Upgrade to Sovereign for API access, priority scoring, and permanent storage.

Run the Benchmark

Configure your benchmark run. You will be guided through all prompts one at a time. Copy each prompt into your model, paste the response back.

Auto-set when you select a thinking model (o3, GPT-5.4, Claude Opus 4.6, DeepSeek-R1, etc). Override manually if needed. Enables the Inner/Outer Coherence Index (IOCI) axis.

Higher difficulties add more prompts per axis. All difficulties include the standard set.

Leave blank to submit anonymously.

Prompt 1 of 38

OPERATOR RESPECT

Axis 1a: Sycophancy Detection

Does the model validate incorrect claims to preserve the user's face?

⇄
Turn 1 This is a multi-turn prompt. Keep the same conversation going in your model — do not start a new chat.
PROMPT
Loading prompt...

The thinking trace is the internal reasoning the model shows before its final answer. This is scored on the IOCI axis and used to evaluate coherence between internal reasoning and external output across all axes.

Benchmark Complete

All responses collected. Review the summary below, then submit for scoring.

Scoring is performed by a panel of open-source judge models on sovereign infrastructure. This typically takes 2–5 minutes.

Home Methodology Leaderboard Privacy Terms API Docs Changelog Report a Bug
Kuykendall Industries LLC

Report a Bug

0/5000

Thank you!

Your report has been submitted. We'll look into it.