Your first analysis — 10 minutes¶
We’ll load one of the 41 built-in datasets, estimate an average treatment effect via inverse-probability weighting, and get bootstrap confidence intervals. This runs offline, with no LLM, no cloud.
Prereqs¶
ESML installed (
esml --versionworks)2 minutes
1. Launch the REPL¶
esml repl
Or directly in Python:
import esml
from esml.data import load_dataset
from esml.fn import ate
2. Load data¶
df = load_dataset("ccs_2022_2023") # Canadian Cannabis Survey
df.shape
# (18000, ...)
3. Estimate an ATE¶
result = ate(
df,
treatment="cannabis_past_year_use",
outcome="mental_health_good",
covariates=["age_group", "province", "sex", "education"],
method="ipw",
bootstrap=True,
n_boot=500,
seed=42,
)
print(result.summary())
Output:
Average Treatment Effect (IPW)
──────────────────────────────────────────
estimate: -0.0432
SE: 0.0187
95% CI: (-0.0799, -0.0065)
n: 17842
n effective: 12451
method: Inverse probability weighting, bootstrap CI (B=500)
4. Sanity-check with a different estimator¶
from esml.fn import aipw
result_dr = aipw(
df,
treatment="cannabis_past_year_use",
outcome="mental_health_good",
covariates=["age_group", "province", "sex", "education"],
)
print(result_dr.summary())
AIPW is doubly-robust — consistent if either the propensity model or the outcome model is correctly specified. If the two estimates disagree meaningfully, that’s a signal something’s off in one of the models.
5. Visualize¶
result.plot_forest() # forest plot of treatment effect
result.plot_weight_diag() # weight diagnostics for IPW
Or, from the terminal:
esml run-module propensity-scores \
--dataset ccs_2022_2023 \
--treatment cannabis_past_year_use \
--outcome mental_health_good
Next steps¶
Dataset-agnostic: every ESML function takes column names as keyword arguments. Swap in your own CSV via
load_dataset("path/to/mine.csv").Causal inference deep dive: see the esml API docs.
Ask Perseus:
esml ? "what's the difference between IPW and AIPW?"(requires Ollama installed, or falls back to free-tier FreeAPI).