Blog
Conceptual explainers and essays on causal evaluation, LLM judges, and alignment.
Start Here
Your AI Metrics Are Lying to You
Why "You're absolutely right!" scored 9/10 but tanked user satisfaction by 18%. Zero math, just the core insight.
Read the flagship post →Looking for theory?
Research papers with formal proofs and identification results
Looking for benchmarks?
Arena experiment: 14 estimators on 5k evaluations
Loading posts...
