Studies

Controlled comparisons

Each study holds everything fixed but one variable and shows how it moves the result. The contenders are ranked by the hardest step each one holds, with a written analysis.

4 levels model analysis

The GPT-5 family, ranked by capability

How the GPT-5 variants compare on the capability ceiling for long-context code reasoning, from nano up to the full model and the Codex variant. Same instrument and scoring; only the model changes.

Read the study →