About CIMO Labs
We build tools that bring statistical rigor to LLM evaluation.
What is CIMO?
CIMO stands for Causal Information Manifold Optimization—a research framework for understanding how to efficiently measure, quantify, and allocate resources when predicting potential outcomes in the real world.
The core question: Given limited resources (labels, compute, time), what information do we actually need to make valid causal predictions? How do we measure it? How much of it do we need? And how do we allocate effort to get the most reliable estimates?
This philosophy drives all our work—from calibration (what information maps judge scores to outcomes?) to off-policy evaluation (what information do logs contain about counterfactual policies?) to diagnostics (when do we have enough information to trust an estimate?).
Mission
Most teams ship LLM changes based on heuristics—average judge scores, vibes from demos, or optimism about prompt tweaks. We're building the infrastructure to replace guesswork with statistically valid deployment decisions.
Our tools turn unreliable evaluation signals into audit-ready estimates with confidence intervals you can defend to stakeholders, regulators, and yourself.
Research Focus
We work at the intersection of causal inference, off-policy evaluation, and LLM systems. Our current focus areas:
Calibrated Judge Evaluation
Mapping unreliable judge scores to KPI-calibrated estimates using mean-preserving transformations and oracle-uncertainty aware confidence intervals.
Off-Policy Estimation for LLMs
Adapting importance sampling and doubly robust methods to handle distributional shifts, heavy-tailed weights, and partial observability in language model deployments.
Diagnostic Infrastructure
Building operator-facing tools that surface coverage issues, overlap problems, and calibration failures—with concrete remediation strategies.
Approach
We believe evaluation infrastructure should be:
- Statistically principled. Every estimate comes with a confidence interval that accounts for all sources of uncertainty.
- Causally interpretable. We estimate what would happen if you deployed the policy, not just observational correlations.
- Diagnostic-first. Tools should tell you when they're unreliable and how to fix it, not just silently fail.
- Practitioner-focused. Built for teams shipping production systems, not just researchers publishing papers.
Open Source
Our core tools are open source. We believe the LLM ecosystem needs shared infrastructure for rigorous evaluation, not proprietary black boxes.
CJE (Causal Judge Evaluation) is our flagship library for turning judge scores into statistically valid estimates. MIT licensed, actively maintained, with comprehensive documentation.
Founder

Eddie Landesberg is a research scientist and software engineer focused on causal evaluation for AI systems.
Previously at Stitch Fix, Eddie built a causally rigorous advertising spend optimization system from scratch that managed $150M/year in spend; randomized experiments indicated ~$40M/year in incremental efficiency gains. He also improved the deep learning personalization engine powering the company's core product and authored "Want to make good business decisions? Learn causality"—one of the most cited posts on Stitch Fix's technical blog, featured in university curricula and cited in the Academy of Management Review.
At Salesforce, he was the first data science hire in the marketing org and led the first machine learning models deployed to salesforce.com end-to-end. As co-founder and CEO at Fondu, he built consumer-facing long-term memory for LLMs—thousands of users, ~2 hours/WAU, ~40% D30 bounded retention—and was featured in a16z's Building the AI Brain.
Eddie holds a BA in Economics from Georgetown University, with mathematical concentration in linear algebra, econometrics, game theory, differential equations, convex optimization, and dynamic programming. He has given guest lectures at Stanford MS&E on the economics of consumer data.
CIMO Labs is backed by former C-level tech leaders from Stitch Fix and Netflix, a Meta researcher, a former public company CEO, and the CIO of AllianceBernstein.
Contact
We work with teams facing challenging evaluation problems—high-stakes deployments, limited labeled data, distributional shift, or regulatory requirements.