CIMO LabsCIMO Labs

Quick Start

Get CJE running on your data in under 5 minutes.

1. Install

# Using pip
pip install causal-judge-evaluation
# Or clone from GitHub
git clone https://github.com/cimo-labs/cje.git
cd cje && pip install -e .

2. Prepare your data

CJE needs three inputs:

📊 Evaluation logs (required)

Your LLM conversation logs with:

  • • Prompts and responses
  • • Policy identifiers (model/prompt version)
  • • Judge scores (GPT-4, Claude, etc.)
logs.parquet or logs.jsonl

🎯 Oracle labels (small sample)

Ground truth labels for ~100-200 examples for calibration. Can be human preferences, task success, or downstream metrics.

oracle.csv with columns: [id, label]

⚙️ Config (optional)

Customize estimators, folds, and diagnostics.

config.yaml (uses sensible defaults if omitted)

3. Run evaluation

Command line

cje evaluate \
--data logs.parquet \
--oracle oracle.csv \
--output results/

Python API

from
cje
import
Pipeline
pipeline = Pipeline()
results = pipeline.evaluate(
data=
"logs.parquet"
,
oracle=
"oracle.csv"
)
results.summary()

4. Interpret results

CJE outputs three key files:

📈 Point estimates with confidence intervals

# results/estimates.json
{
"policy_A": 0.523 ± 0.021,
"policy_B": 0.487 ± 0.019,
"difference": 0.036 ± 0.028,
"p_value": 0.023
}

🔍 Diagnostics

# results/diagnostics.json
{
"ESS": 0.946, // 94.6% effective sample size
"judge_calibration": 0.89, // R² of calibration
"overlap": 0.72, // Policy overlap
"status": "PASS" // or "REFUSE-LEVEL"
}

📊 Visualizations

Auto-generated plots for calibration curves, weight distributions, and PIT diagnostics.

5. Make decisions

✅ Ship when:

  • • Status = "PASS" (all diagnostics passed)
  • • ESS > 0.5 (50%+ effective samples)
  • • Confidence interval excludes zero
  • • p-value < your significance threshold

⚠️ Don't ship when:

  • • Status = "REFUSE-LEVEL" (diagnostics failed)
  • • ESS < 0.3 (low effective samples)
  • • Judge calibration R² < 0.7
  • • Extreme weight concentration (top 1% > 50% mass)

Next steps

Need help?

Check the GitHub issues or reach out at eddie@cimolabs.com