Design-by-Projection: A General Principle for Structure-Aware Estimation
The principle: When you know structural properties of your problem (outcomes increase with judge scores, importance weights should average to 1), but don't know the exact functional form, project your empirical data onto the set of functions satisfying those constraints. This preserves unbiasedness while reducing variance.
The framework: Design-by-Projection (DbP) unifies AutoCal-R (reward calibration via isotonic regression) and SIMCal-W (weight stabilization via isotonic projection) under a single principle. Both methods project onto convex constraint sets, but AutoCal-R operates on judge scores → oracle outcomes, while SIMCal-W operates on importance weights.
Rather than estimating unconstrained functions that overfit small samples or imposing rigid parametric forms (linear, logistic) that misspecify, Design-by-Projection finds the closest function (in least-squares sense) that satisfies what you know must be true. The result: automatic variance reduction, bias-variance trade-offs that favor small oracle samples, and interpretable output.
In Arena: DbP instantiates as AutoCal-R (reward calibration with a mean constraint and covariates) and SIMCal-W (mean-one monotone weight calibration). Together they explain why Direct and DR work well—and why IPS alone fails under overlap scarcity.
DbP Assumptions at a glance
AutoCal-R: Monotonicity of in a judge-based risk index ; mean-preservation enforced on the oracle slice.
SIMCal-W: Stabilized weights are a monotone function of (or ), nonnegative, unit-mean; projection cannot create overlap—diagnose ESS and tails.
The Problem: Balancing Flexibility and Structure
Suppose you're calibrating an LLM judge to predict oracle outcomes. You have:
- observations: judge scores and oracle labels
- Goal: Learn to predict oracle outcomes from judge scores
- Known constraint: Higher judge scores should predict no worse oracle outcomes (monotonicity)
The Goldilocks problem
Too flexible (unconstrained regression): Learns arbitrary non-monotone wiggles, overfits noise in small samples, produces calibrated predictions that invert judge rankings.
Too rigid (linear regression): Forces , misspecifies when the true relationship is nonlinear (e.g., saturation at high scores, floor effects at low scores).
Just right (isotonic regression): Flexible enough to capture nonlinearity, constrained enough to avoid overfitting. Learns piecewise-constant monotone function that fits data while respecting known structure.
The Design-by-Projection Principle
Core idea: Encode what you know (or assume) as a convex constraint set , then project your empirical estimate onto .
Projection formula
Find the function in the constraint set that is closest (in norm) to the unconstrained empirical estimate .
Why does this work?
When is convex and contains the true function, projection has three key properties:
- Bias–variance trade-off: Projection onto a correct constraint set is a contraction that typically reduces variance and can reduce MSE; it does not generally preserve finite-sample unbiasedness. For reward calibration we recover the right mean by explicit mean-preservation; for weight calibration we enforce unit-mean weights.
- Variance reduction: Projection is a smoothing operation. By ruling out functions that violate known constraints, you reduce the effective degrees of freedom, lowering variance. For cones that contain the origin (e.g., the monotone cone), projection weakly reduces the norm. For general convex sets, projection minimizes distance to , not necessarily the norm.
- Interpretability: The output respects structural knowledge (monotonicity, mean preservation, boundedness), making results easier to validate and debug. You can't get perverse predictions that violate domain knowledge.
Projection in Hilbert spaces (intuition)
For a closed convex set , the metric projection is unique and is characterized by the variational inequality for all . For cones containing 0, projection weakly reduces norm; in general it minimizes distance to . This orthogonality condition explains why imposing structure reduces variance without overfitting.[1,2]
Application 1: AutoCal-R (Reward Calibration)
Problem: LLM judge scores are on an arbitrary scale. You need to map them to oracle outcomes for downstream estimation.
Constraint set: . Monotonicity is the minimal assumption: better judge scores shouldn't predict worse outcomes.
Monotone mode
Directly project judge scores to oracle outcomes via isotonic regression:
This is isotonic regression: least-squares fit subject to monotonicity.[3,4] The solution is a piecewise-constant function computed efficiently via the Pool Adjacent Violators (PAV) algorithm in O(n) time on sorted scores (or O(n log n) including the initial sort).[5,6]
Two-stage mode (with covariates)
When judge scores have systematic bias (e.g., response length affects scores independent of quality):
- Stage 1 (risk index): Learn (e.g., spline on )
- Stage 2 (isotonic): Fit by isotonic regression of on , then apply a constant shift so that the oracle-slice mean of matches that of
This corrects systematic judge bias (e.g., verbosity preference) while retaining monotonicity in the risk index .
Why isotonic regression?
- Mean preservation (how we enforce it): Vanilla isotonic is an projection onto the monotone cone and does not by itself match the oracle mean.[3,4] In AutoCal-R we enforce mean preservation via a constant shift: , which preserves monotonicity and puts the calibrator on the oracle scale. (Clip to [0,1] if needed.)
- Minimal assumptions: Only requires monotonicity, not linearity or parametric form
- Small-sample efficiency: Works with 5-25% oracle coverage (50-1250 labels)
- Adaptive complexity: For isotonic regression, the degrees of freedom equals the number of constant blocks in the fit and adapts to signal complexity;[4] in practice it is far smaller than , yielding substantial variance reduction.
Application 2: SIMCal-W (Weight Stabilization)
Problem: Off-policy importance weights are often extreme, leading to high variance and poor effective sample size (ESS).[11,12]
Constraint set: , i.e., unit mean under the logger.[9,10] Calibrated weights should be nonnegative, monotone in a risk index, and preserve unbiasedness.
The stacked isotonic projection
SIMCal-W builds two candidate weight functions:
- Increasing candidate: Isotonic regression of on (higher scores → higher weights)
- Decreasing candidate: Antitonic regression of on (higher scores → lower weights; isotonic under reversed order)
After smoothing, we enforce nonnegativity and unit mean: rescale .[9,13] Stacking uses cross-fitted out-of-fold influence functions to tune by minimizing estimated variance:[14,15]
Why stacking?
By considering both directions (increasing and decreasing), SIMCal-W avoids having to assert which direction the monotone relationship should go. The data tells you: if increasing weights better stabilize the estimate, λ → 1; if decreasing weights are better, λ → 0. This makes the method robust to misspecification of the monotone direction.
No new overlap: Stabilization prevents numerical degeneracy but cannot create support where the logger has none.[16] Always report ESS, max/median weight, and a tail index before/after smoothing.[11,12] In LLM OPE, raw often come from teacher-forced sequence likelihoods; these can be noisy or structurally misspecified.[21] Stabilization helps variance, but cannot fix overlap or propensity misspecification.
Theoretical Guarantees
1. Projection theorem (convex analysis)
For any closed convex set in a Hilbert space, the projection exists, is unique, and satisfies:
The residual is orthogonal to the constraint set. This is the Pythagorean theorem in Hilbert space: projecting reduces norm without introducing bias.
2. Monotone projection bounds variance
For isotonic regression on observations, the degrees of freedom satisfies , where is the number of constant blocks in the fitted function. For smooth monotone signals, the fitted isotonic has constant pieces and risk ,[7,8] far fewer degrees of freedom than unconstrained fits, delivering substantial variance reduction.
3. Dispersion reduction (SIMCal-W)
The mean-one isotonic projection reduces dispersion (ESS↑) and typically improves tail metrics (max/median, tail index),[9,11,13] though strict Lorenz dominance is not guaranteed without additional conditions. For repeated refits (OUA jackknife), isotonic's complexity after sorting keeps total runtime modest.
Connection to Other Methods
| Method | Constraint Set | DbP Perspective |
|---|---|---|
| Isotonic regression | Monotone functions | Project onto monotone cone |
| Platt scaling | Logistic link functions | Parametric constrained fit; not a convex projection |
| Lasso | Sparse coefficients () | Project onto ball |
| Ridge regression | Small coefficients () | Project onto ball |
| Constrained MLE | Valid probability distributions | Bregman projection (KL) onto simplex |
| Survey calibration | Weights match moment constraints | Bregman projection minimizing divergence from design weights |
Many classical statistical methods can be viewed as projections onto constraint sets. DbP makes this perspective explicit and extensible: define your constraints (monotonicity, sparsity, smoothness, bounds), construct the convex set, and project.
Beyond Euclidean projection: Many calibration problems are more natural in a Bregman divergence (e.g., KL for probabilities). DbP extends beyond : raking/calibration estimators in survey sampling (e.g., Deville–Särndal) are Bregman projections[17,18,19,20] that match moments while staying close to the starting weights—conceptually adjacent to SIMCal-W. DbP is just constrained empirical risk minimization viewed through the lens of projections onto convex sets; the lens is useful because it yields general variance-reduction and stability intuitions.
Implementation in CJE
Design-by-Projection is implemented in the CJE package via:
- AutoCal-R:
cje.calibration.AutoCalfor reward calibration - SIMCal-W:
cje.calibration.SIMCalfor weight stabilization
Choosing calibration methods
- AutoCal-R (monotone): Default for most cases. Minimal assumptions, works with small samples (5-25% oracle coverage).
- AutoCal-R (two-stage): When you have covariates that create non-monotone bias (response length, prompt difficulty).
- SIMCal-W: For off-policy estimators (IPS, DR) when raw importance weights have low ESS (< 10-20%).
Inference
When DbP is learned from a partial oracle slice, we include OUA (Oracle Uncertainty Accounting)—delete-one-fold jackknife over oracle folds[22,23,24]—to account for calibrator learning variance in standard errors. This ensures that confidence intervals reflect both sampling uncertainty and the uncertainty from estimating the calibration function .
When Design-by-Projection Works Best
Ideal scenarios
- You have strong structural knowledge (monotonicity, bounds, sparsity) that's unlikely to be violated
- Sample size is moderate (100-10,000 observations) where unconstrained methods overfit but parametric methods misspecify
- You need interpretable output (e.g., to audit or explain calibration to stakeholders)
- Oracle labels are expensive, so you want maximum efficiency from 5-25% coverage
When to consider alternatives
- Abundant data (n > 10,000) + known parametric form: Use parametric calibration (Platt scaling, Beta calibration) for lower variance
- Structural assumptions violated: If monotonicity fails in reality, isotonic regression will impose it anyway. Test on holdout data.
- Very small samples (n < 50): Consider Bayesian methods with informative priors instead of projection
Caution: Monotonicity violations
If monotonicity fails materially (e.g., adversarial judge artifacts), DbP will enforce it anyway. Use holdout residuals by policy/domain/length to detect such failures, and either expand to include the violating covariates or switch to a richer judge.
Related Work and Extensions
Design-by-Projection builds on several classical results:
- Isotonic regression: Barlow et al. (1972), Ayer et al. (1955) - foundational work on monotone regression
- Shape-constrained estimation: Groeneboom & Jongbloed (2014) - comprehensive treatment of convex, concave, and monotone constraints
- Survey calibration: Deville & Särndal (1992) - calibration estimators that adjust weights to match constraints while minimizing divergence
- Calibration for inverse propensity weighting: van der Laan et al. (2025) - isotonic calibration for stabilizing IPW estimators (CLeaR 2025)
Within Causal Learning and Reasoning (CLeaR) / causal-learning circles, DbP fits the broader program of shape-constrained, structure-aware learning that trades small bias for large variance reductions with explicit guarantees.
Extensions under development: Multi-dimensional monotonicity (partial orders), shape constraints beyond monotonicity (convexity, unimodality), adaptive constraint selection via cross-validation.
Conclusion
Design-by-Projection provides a principled framework for incorporating structural knowledge into estimation. By projecting onto convex constraint sets, you get:
- Automatic variance reduction without sacrificing unbiasedness (when constraints are correct)
- Interpretable output that respects domain knowledge
- Unified treatment of reward calibration (AutoCal-R) and weight stabilization (SIMCal-W)
- Computational efficiency via fast projection algorithms (O(n) after sorting for isotonic regression)
For LLM evaluation, where oracle labels are expensive and judge scores are plentiful, DbP's efficiency with small oracle samples (5-25% coverage) makes it particularly valuable. The framework scales from quick prototypes (monotone mode) to production systems (two-stage with covariates).
Practical takeaway: Before fitting unconstrained or rigidly parametric models, ask: "What do I know must be true?" Encode that knowledge as constraints, project onto them, and let the projection theorem do the work.
References
Cite this work
APA
Eddie Landesberg. (2025, October 10). Design-by-Projection: A General Principle for Structure-Aware Estimation. CIMO Labs Blog. https://cimolabs.com/blog/design-by-projection
BibTeX
@misc{landesberg2025design-by-projection:,
author = {Eddie Landesberg},
title = {Design-by-Projection: A General Principle for Structure-Aware Estimation},
howpublished = {\url{https://cimolabs.com/blog/design-by-projection}},
year = {2025},
note = {CIMO Labs Blog}
}