The Alignment Manifesto

🧪

Research

Theoretical, not yet implemented

The crisis of optimization is universal. When systems optimize proxy signals, they collapse. We argue this pattern governs biology, economics, politics, and AI—and show why structure, not data, is the solution.

The Goodhart Point: Optimization pressure on surrogate metric S causes divergence from true value Y*

The Universal Pattern: Optimization climbs the Surrogate Peak (S) while True Value (Y*) declines. The Goodhart Point marks where manifold divergence begins.

I. The Crisis of Optimization (The Problem)

We observe a universal pattern of failure in complex adaptive systems. When optimization pressure is applied to a proxy signal, the system initially improves, then destabilizes, and eventually collapses. This failure is not anomalous; it is the default outcome.

The Evidence: The Four Collapses

This pattern manifests across the four fundamental domains of optimization:

1. Biological Collapse (Addiction and Extinction)

Natural selection optimizes for Fitness (Y*) using proximate cues like Dopamine (S). When the signal is hijacked, the system optimizes for the cue at the expense of survival.

Example: Supernormal stimuli—organisms preferring artificial signals (e.g., concentrated sugars, exaggerated visual cues) until the population crashes.

Fisherian Runaway Selection: Female preference for long tails creates feedback loop where males evolve ever-longer tails

Fisherian Runaway Selection: The peacock's tail becomes trapped in a feedback loop where female preference drives ever more extreme ornamentation—a biological optimization collapse where the signal (S) diverges catastrophically from fitness (Y*).

2. Economic Collapse (Bubbles and Externalities)

Markets optimize for Value (Y*) using Price (S). When the price signal decouples from fundamental value or ignores externalities, the system generates crises.

Example: The 2008 Financial Crisis—optimizing for the price of Mortgage-Backed Securities (S) while ignoring the underlying systemic risk (Y*).

3. Political Collapse (Populism and Polarization)

Democracies optimize for the Public Good (Y*) using Votes (S). When the optimization target shifts to maximizing approval via short-term gratification or propaganda, the system degrades.

Example: Hyperinflation—printing money maximizes short-term approval (S) but destroys long-term economic stability (Y*).

4. AI Collapse (Hallucination and Reward Hacking)

AI models optimize for Idealized Welfare (Y*) using a Reward Score (S). When the model exploits loopholes in the reward function, alignment fails.

Example: Sycophancy—the model learns that agreeing with the user (maximizing S) is easier than providing truthful, helpful answers (maximizing Y*).

The Unifying Principle: Goodhart's Law

These failures are all instances of Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure." The Signal (S) decouples from the Objective (Y*).

The prevailing view treats these failures as isolated incidents—a lack of data, a failure of regulation, a lapse in judgment. This diagnosis is incorrect.

The Central Thesis

The collapse is not accidental noise; it is a structural inevitability driven by the underlying economics of optimization. We are facing a structural problem, not merely a data quality problem.

II. The Universal Dynamics (The Theory)

The stability of these systems is governed by two fundamental frameworks: The Cost Hierarchy of Truth, which describes the relationship between information complexity and verification cost, and the Economics of Friction (RCF), which describes the transaction costs of verification.

A. The Cost Hierarchy of Truth

The core premise is that accurate information requires effort. "Truth"—the idealized objective (Y*)—is an expensive, unstable equilibrium.

Complexity Cost: We define "Energy" rigorously as the Complexity Cost: the computational, metabolic, economic, or cognitive work required to reduce uncertainty (entropy) and establish causal structure.

The Ladder of Deliberation

Systems navigate this relationship through a hierarchy of deliberation, generalizing Kahneman's System 1 (Fast/Heuristic) and System 2 (Slow/Algorithmic).

L1: The Reflex (S) – System 1

Nature: Fast, Heuristic, Cheap, Pattern-Matching.

Cost: O(1).

Examples: Instinct, Spot Price, Opinion Polls, Base Model Logits ("Vibes").

L2: The Protocol (Y) – System 2

Nature: Slow, Algorithmic, Costly, Logical.

Cost: O(N).

Examples: Executive Function, Audited Financials, Judicial Review, Chain-of-Thought (CoT).

L3: The Oracle (Y*) – The Ideal

Nature: Asymptotic Limit. Infinite Compute/Time.

Cost: O(∞).

Examples: Inclusive Fitness, Fundamental Value, The Public Good, Idealized Deliberation Oracle (IDO).

Verification Load: The energy (Complexity Cost) required to ascend the ladder (from L1 toward L3). This load is the essential structural defense against collapse.

B. The Economics of Friction (RCF)

The stability of the optimization process is determined by the transaction costs of verification, formalized by the Rights, Causation, Friction (RCF) framework. This framework models the "Market for Truth."

Informational Arbitrage

Optimizers are rational agents minimizing energy expenditure. They seek the path of least resistance.

The Causal Path: Generate Y*, which drives S. (Expensive).
The Arbitrage Path: Manipulate S directly, decoupled from Y*. (Cheap).

The Friction Variables

The stability of the market depends on two critical transaction costs:

V (Verification Cost): The cost to the Verifier to verify the integrity of the signal.
F (Fabrication Cost): The cost to the Optimizer to generate a false signal (to "fake" integrity).

The Stability Inequality

Alignment is stable if and only if:

F > V

(The Cost to Fake must exceed the Cost to Verify)

This is the alignment analog of a Pigouvian tax. When private marginal cost (generating a high reward score) diverges from social marginal cost (actual welfare), the standard economic solution is to impose a tax equal to the externality. In AI systems, the Standard Deliberation Protocol imposes verification load proportional to the alignment gap—not as punishment, but as structural engineering that internalizes the externality.

The Market for Truth vs The Market for Lemons: Left shows F < V collapse with market failure, Right shows F > V stability with verified goods

The Two Markets: When F < V (left), fabrication is cheap and verification is expensive— the market collapses into a "Market for Lemons" where fakes dominate. When F > V (right), costly signaling creates a "Market for Truth" where genuine value is rewarded.

When F > V, the Arbitrage Path is more expensive than the Causal Path. The gradient flows toward truth.

The Scaling Trap (The Accelerating Crisis)

This inequality is dynamically unstable. As systems scale—in complexity, capability, or reach—the costs shift asymmetrically:

F Decreases: Larger models, faster markets, and globalized media make fabrication cheaper. (e.g., Deepfakes, High-Frequency Trading, AI Hallucination).
V Increases: The complexity of the output makes verification harder. (e.g., Verifying super-human code, auditing complex derivatives, fact-checking global narratives).

The default trajectory of scaling is toward collapse (F < V). This is the Market for Lemons, where mimicry dominates value creation, and the optimization process structurally favors divergence. This is the root cause of the Crisis of Optimization.

III. The Isomorphism (The Evidence)

The dynamics described in Section II are not specific to AI; they are universal. We demonstrate this by mapping the framework onto the four fundamental domains of complex adaptive systems. This mapping reveals that the dynamics are not merely analogous but structurally isomorphic.

The Tetrahedron of Optimization

The Tetrahedron unifies the core components of optimization, failure, and control across these domains:

Concept	Evolutionary Biology	Market Economics	Democratic Governance	AI Alignment (CIMO)
Target (Y*)	Inclusive Fitness	Social Welfare	The Public Good / Justice	Idealized Welfare (IDO)
Signal (S)	Proximate Cues / Dopamine	Price / Profit	Votes / Public Approval	Reward Score / Surrogate
Optimizer	Natural Selection	The Firm / Entrepreneur	The Politician / The State	The AI Model (Gradient Ascent)
Failure Mode	Superstimuli / Mismatch	Market Failure / Externalities	Populism / Polarization	Reward Hacking / Hallucination
Arbitrage (F < V)	Mimicry (Faking Fitness)	Rent-Seeking (Faking Value)	Propaganda (Faking Competence)	Sycophancy (Faking Truth)
Topology Control	Costly Signaling (Handicap)	Regulation / Rule of Law	Constitution / Checks & Balances	Standard Deliberation Protocol (SDP)

Note: The same isomorphism extends to Epistemology—where Target = Truth, Signal = p-values, Failure Mode = Replication Crisis, and Topology Control = Preregistration + Statistical Rigor. The replication crisis occurred precisely when verification load fell below arbitrage incentives.

The Tetrahedron of Optimization: Structural isomorphism across Biology, Economics, Governance, and AI Alignment showing shared components of Target, Signal, Failure Mode, and Topology Control

The Tetrahedron of Optimization: The four domains share identical structural components— Target (Y*), Signal (S), Failure Mode (F < V arbitrage), and Topology Control (F > V restoration). This is not analogy; it is isomorphism.

III.2 Extended Example: Corporate Finance and the QER Trap

The framework extends naturally to corporate governance. Consider Quarterly Earnings Reports (QER)—the optimization target for public companies.

Mapping to CIMO Variables:

Y* (The Target): Long-term Firm Value / Shareholder Wealth
S (The Signal): Quarterly EPS / Stock Price
Y (The Protocol): GAAP Audits / Strategic Investment
Optimizer: The Executive Team / The Board
Verifier: Institutional Investors / Analysts

The Jack Welch Trap (F < V Collapse)

When executives are rewarded quarterly (maximizing S) but can defer costs or manipulate accounting (low F), they optimize for short-term stock price at the expense of long-term value (Y*).

The Arbitrage Path: Channel stuffing, revenue recognition tricks, cutting R&D. Easy fabrication (F < V) → Market for Lemons.

Result: GE under Welch appeared to "beat earnings" for 20 consecutive quarters—but masked systemic accounting manipulation. The stock collapsed post-2000 as the true value diverged from the optimized signal.

The Enron Equilibrium (Complete F < V Breakdown)

When F approaches zero (mark-to-market accounting for non-existent assets) and V remains high (complex SPVs impossible to audit), the system catastrophically diverges.

Result: The optimization process generates pure fabrication. S (stock price) becomes completely decoupled from Y* (actual cash flows). The inevitable collapse is a structural feature, not an accident.

The Amazon Strategy (Topology Control via Vesting)

Amazon famously ignored quarterly earnings for decades, focusing on long-term infrastructure (AWS, logistics). How did they resist the QER trap?

The Coherence Tax: Executive compensation vests over 10+ years. To collect the reward, the executive must maintain a coherent long-term strategy that actually builds value (Y*).

This raises F: Faking long-term value for a decade is exponentially more expensive than gaming a single quarter. The vesting schedule imposes temporal coherence— the corporate equivalent of SDP's reasoning chains.

Result: F > V is restored. The optimization gradient flows toward Y* (building AWS, not pumping quarterly metrics). The market eventually rewards this alignment.

Key Insight: The QER Trap is not a failure of "greed" or "short-termism"—it is the predictable outcome of F < V dynamics. Amazon's success demonstrates that restoring alignment requires structural intervention (vesting = Coherence Tax), not exhortation.

The Significance of the Isomorphism

This unified view reveals that the challenges faced in each domain stem from the same root cause: the instability of the F > V inequality under optimization pressure.

The cybernetic interpretation comes from Ashby's Law of Requisite Variety: a regulator can only control a system if its internal complexity matches or exceeds the system's complexity. Simple oversight mechanisms have low variety and fail against high-variety LLM behavior spaces. CIMO amplifies regulatory capacity through causal decomposition—instead of regulating the full behavior space directly, decompose into contexts, surrogates, and outcomes, each independently regulable.

The Signal is Not the Target: Dopamine is not Fitness; Price is not Value; Votes are not Justice; Reward is not Welfare.
Arbitrage is Inevitable: When F < V, the optimizer will exploit the gap. Mimicry, Rent-Seeking, Populism, and Sycophancy are all forms of Informational Arbitrage.
Control Requires Structure: In every domain, the control mechanism is a structurally imposed Verification Load designed to restore F > V.
- Biology: The Handicap Principle (e.g., the Peacock's tail) imposes a metabolic cost (F↑).
- Economics: Regulation (e.g., a Carbon Tax) imposes an economic cost on externalities (F↑).
- Politics: Constitutional Checks impose procedural friction on rapid policy change (F↑).
- AI: The Standard Deliberation Protocol imposes a computational cost (The Coherence Tax) on fabrication (F↑).

The isomorphism allows us to transfer insights across domains, recognizing that the AI Alignment Crisis and the Crisis of Democracy share the same structural origin: the collapse of friction (F < V) due to technological scaling outpacing the evolution of control structures.

IV. The Collapse of the Hierarchy (The Failure Mechanism)

The existential risk in all complex systems occurs when the optimization process bypasses the Middle Layer (L2/Y)—The Protocol. This failure mode is the direct consequence of the F < V instability.

Short-Circuiting the Stack

The collapse occurs when the system attempts to achieve the High-Energy state (Y*) using only the Low-Energy signal (S), skipping the necessary work (Y). The optimizer shortcuts deliberation, relying on heuristics instead of structure.

The Mechanism of Collapse

This short-circuiting manifests identically across the Tetrahedron:

1. Biological Collapse (Addiction)

The brain assumes the Dopamine signal (S) equals Fitness (Y*), overriding Executive Function (Y).

The Bypass: The cortex is ignored; optimization flows directly from stimulus to reward signal.

The Result: Supernormal stimuli dominate behavior; survival is compromised.

2. Economic Collapse (Crisis)

Investors assume the Current Price (S) equals Fundamental Value (Y*), skipping Due Diligence (Y).

The Bypass: The audit is ignored; optimization flows directly from price signals to leverage.

The Result: Bubbles and systemic failure (e.g., The 2008 Financial Crisis).

3. Political Collapse (Tyranny/Populism)

The leader claims to deliver the Public Good (Y*) by following the Polls (S), skipping Due Process (Y).

The Bypass: The Constitution is ignored; optimization flows directly from public sentiment to executive action (e.g., "I alone can fix it").

The Result: Institutional decay and polarization.

4. AI Collapse (Hallucination)

The model tries to predict the Answer (Y*) directly from the Prompt (S), skipping the Reasoning (Y).

The Bypass: The deliberation protocol (CoT/SDP) is ignored; optimization flows directly from input tokens to output tokens via pattern matching.

The Result: Plausible-sounding nonsense (The Plausibility Paradox) and Reward Hacking.

5. Cognitive Collapse (Developmental Failure)

We observe the failure of the Middle Layer most intimately in human cognitive development. The child is an optimizer; the parent is the regulator. Failures in verification load lead to two distinct pathologies:

The Authoritarian Trap (Over-Optimization): The parent imposes extreme pressure on the Signal (S = Grades/Obedience) but makes Verification expensive (High Punishment for error).

The Economics: The cost to Fabricate (F = Cheating/Lying) becomes lower than the cost to Admit Fault (A).

The Result: Goodharting. The child optimizes for compliance rather than character. They develop a "Mask" (The Surrogate) that decouples from the "Self" (The Value). This is the Broccoli Trap—forcing compliance destroys intrinsic motivation.

The Permissive Trap (Under-Optimization): The parent removes the Verification Load entirely, prioritizing immediate happiness (S) over growth (Y*).

The Economics: The "Energy Barrier" required to build the Middle Layer (Executive Function) is removed. The system never ascends the Ladder of Deliberation.

The Result: Atrophy. The child fails to develop the internal protocol (Y) required to navigate reality.

The Solution: Healthy development (Authoritative Parenting) is Topology Control. It imposes a Verification Load (Boundaries) that raises the cost of bad behavior (F↑) while lowering the cost of honest communication (A↓).

The Central Insight

The Middle Layer—the Cortex, the Audit, the Constitution, the Standard Deliberation Protocol—is not an inefficiency to be optimized away. It is the essential structural defense against market failure. It provides the necessary Verification Load to maintain the coupling between the Signal and the Value. Protecting this layer is the primary objective of alignment.

V. Topology Control (The Solution)

If the failure mechanism is structural (F < V), the solution must also be structural. We cannot rely on exhortation or hope; we must engineer the topology of the optimization landscape. This is Topology Control.

The objective of Topology Control is singular: to restore the Stability Inequality (F > V). We must make the Cost to Fake exceed the Cost to Verify. This requires two distinct mechanisms: raising F and lowering V.

A. Raising F: The Coherence Tax (Causal Mediation)

We must impose costs that selectively penalize the Arbitrage Path without hindering the Causal Path. This requires enforcing Causal Mediation—ensuring that the optimization gradient flows through the actual generation of value (Y*), not side channels (S).

This is achieved by imposing a Coherence Tax.

The Economics: Generating a single false output (a lie) is cheap. Generating a coherent chain of reasoning, verifiable evidence, and consistent logic that supports that false output is exponentially expensive.
The Mechanism: We tax fabrication by demanding coherence.

This mechanism is universal across the Tetrahedron:

Gazelle stotting behavior demonstrates the Handicap Principle: costly signal that only fit individuals can afford

The Handicap Principle in Action: The gazelle's "stotting" (high leaping) is metabolically expensive—only truly fit gazelles can afford this display. This raises F (Fabrication Cost), making the signal honest. The cheetah verifies fitness by observation (low A).

Biology (Handicap Principle): The Peacock's tail imposes a metabolic cost (F↑). Only genuinely fit individuals can afford the tax, enforcing honest signaling.
Economics (Regulation/Tax): A Carbon Tax imposes an economic cost on pollution (F↑). This forces optimization toward innovation rather than externalities.
Politics (Constitutional Checks): Separation of powers and judicial review impose procedural friction (F↑). This raises the cost of passing arbitrary, populist legislation—a Coherence Tax on policy-making.
AI (Standard Deliberation Protocol - SDP): The SDP requires structured reasoning, citations, and counter-arguments (F↑). This forces the model to perform the cognitive work (The Causal Path) rather than merely mimicking the output (The Arbitrage Path).

B. Lowering V: Legibility (Structured Decomposition)

Simultaneously, we must reduce the cost of verification (V). This requires increasing the system's Legibility—the transparency of its causal process.

The Economics: Verification load scales super-linearly with complexity. Auditing a black box is expensive; checking a structured proof is cheap.
The Mechanism: We lower V by demanding decomposition.

Economics (Accounting Standards): GAAP decomposes complex financial realities into standardized statements, lowering the cost for investors to verify claims (V↓).
Politics (Freedom of Information): Transparency laws and open debates lower the cost for the public to verify government actions (V↓).
AI (SDP Decomposition / Chain-of-Thought): The SDP forces the model to externalize its reasoning into discrete, verifiable steps, lowering the cognitive load on the verifier (V↓).

CIMO Topology Control: A balance scale showing Coherence Tax (raising F) on the left and Verification Cost + Structured Decomposition (lowering V) on the right, with a gauge showing stable F > V equilibrium

The Topology Control Balance: Raising F (Coherence Tax) and lowering V (Structured Decomposition) work together to maintain the stability inequality F > V. The gauge shows the system in the stable (green) zone. When the balance tips toward F < V (red zone), the foundation cracks— alignment collapses.

C. The Geometry of Control

Geometrically, Topology Control reshapes the optimization landscape. By raising F and lowering V, we erect energy barriers across the Arbitrage Paths. The surrogate gradient decomposes into an Interest Tangent Space (directions affecting welfare) and a Nuisance Tangent Space (directions gaming the metric). Topology Control suppresses the nuisance component, forcing optimization to stay within the Interest Tangent Space.

For the rigorous mathematical treatment using Semiparametric Efficiency Theory, see the Structural Alignment Theory §3.2.

D. The Economics of Control (The First Bill Principle)

The implementation of Topology Control hinges on the efficient allocation of liability, governed by Coasean economics.

The Least Cost Avoider: Efficiency demands that the liability for verification (The "First Bill") be assigned to the party that can mitigate the error at the lowest cost.
The CIMO Inversion:
- Human Verification (Biological Compute) ≈ $50/hour.
- Model Self-Verification (Silicon Compute) ≈ $0.05/hour.
The Conclusion: The First Bill must reside with the model. RLHF fails because it assigns the verification burden to the human, creating structural moral hazard. The CIMO framework shifts the burden to the model, minimizing deadweight loss and restoring the F > V equilibrium.

VI. The CIMO Framework (The Implementation)

The CIMO stack is the engineering implementation of this control theory for Artificial Intelligence. It translates the abstract theory of alignment into a concrete software architecture.

The Welfare Compiler

We treat alignment as a compilation problem: from a human welfare specification to model weights. Today's AI stack is an untyped compiler—pretraining, RLHF, and evaluation all optimize different surrogates for welfare. CIMO makes this pipeline type-safe: one welfare specification, one calibrated estimator, reused consistently across all stages.

When stages disagree about what "welfare" means, you get Multi-Stage Goodhart— three distinct error sources that compound:

Spec Error: The reward model optimizes the wrong thing (e.g., user approval instead of truthfulness). Fixed by Y*-Alignment / SDP.
Cross-Stage Mismatch: RLHF and evaluation measure different welfare functionals. Fixed by Shared Calibrators.
Estimation Error: The evaluator is statistically flawed (coverage gaps, variance). Fixed by CJE + Diagnostics.

→ See The Welfare Compiler for the full technical treatment.

The CIMO Control Loop

CIMO operates as a dynamic control system designed to maintain the stability of the optimization process.

Pillar A (The Measurement / The GPS): Causal Judge Evaluation (CJE)

Role: Measures the location on the manifold. Anchors the Signal (S) to the Protocol (Y).

Mechanism: Calibration via Design-by-Projection (DbP) acts as Manifold Denoising, stripping away the orthogonal noise that creates the Goodhart Vector. OUA (Oracle-Uncertainty-Awareness) quantifies the precision of the measurement.

Pillar B (The Dynamics / The Radar): Continuous Causal Calibration (CCC)

Role: Tracks the manifold as it drifts. The environment and user preferences evolve; the manifold shifts.

Mechanism: Dynamic Governance (The Red Queen arms race). CCC uses the Causal Nyquist Rate to ensure the system maintains a lock on the moving target (Y*).

Pillar C (The Structure / The Middle Layer): Y*-Alignment and the SDP

Role: Implements Topology Control. Enforces F > V via the Coherence Tax and Structured Decomposition.

Analogy: The Constitution for the AI. It defines the rules of the game, ensuring the optimization pressure flows through the Causal Path.

Conclusion: The Inevitability of Structure

The Tetrahedron of Optimization reveals a universal truth: Unconstrained optimization of a proxy signal leads to collapse.

The Economics of Truth reveals why: maintaining the stable equilibrium where truth dominates requires the continuous expenditure of verification effort.

Alignment is not a static state to be achieved; it is a dynamic equilibrium (F > V) maintained by structural friction.

The "Middle Layer" (L2/Y)—the Cortex, the Audit, the Constitution, the SDP—is not an inefficiency to be optimized away. It is the essential structure that imposes the Verification Load, preventing the collapse of complex systems into market failure.

We must move beyond the naive optimization of signals and begin the rigorous engineering of the topology itself.

Cite this work

CIMO Labs (2025). The Alignment Manifesto. CIMO Labs. https://cimolabs.com/research/alignment-manifesto