Why OpenCausality

Cheney Li

Why OpenCausality The past decade of data science was driven by a behavioral revolution: the mass adoption of smartphones and mobile internet — accelerating after the iPhone's launch in 2007 — put billions of people online and generated unprecedented volumes of behavioral data. Every click, purchase, search, and location trace became a row in a table. The cheapest thing to do with all that data was predict from it. Predictive ML flourished not because prediction is what organizations need most, but because the sheer abundance of data made it the path of least resistance. Correctness — understanding why something happens — was never the optimization target. But prediction and causation are fundamentally different things. A predictive model learns whatever statistical patterns help it forecast — including spurious correlations and confounded associations — without needing to represent the actual data generating process. Correlations can arise from confounding, reverse causality, or selection, and a predictive model has no reason to distinguish these from genuine causal effects. Knowing that A predicts B tells you nothing about whether changing A would change B. Causal inference is the discipline of recovering the data generating process itself: identifying which variables actually affect which, under what assumptions, and what would happen if you intervened. That is the problem that matters when decisions have consequences. Yet causal reasoning is not a niche academic skill. It is among the most fundamental forms of intelligence: pre-linguistic, rooted in physical interaction with the world, present in infants long before they can speak. Every time a child pushes a block off a table, they are running a causal experiment. This is the reasoning mode that AI systems need to create real-world value — not just correlation-surfing, but genuine understanding of mechanism and intervention. The bottleneck is not insight. Researchers across economics, epidemiology, and the social sciences already carry rich causal intuitions about their domains. The bottleneck is tooling: the mechanical overhead of translating a causal story into a formal DAG, selecting an appropriate estimator, diagnosing identification failures, and documenting every decision for reproducibility. Researchers spend their time fighting software instead of forming and testing hypotheses. OpenCausality removes that bottleneck. You describe your causal story — in YAML or plain English — and the framework handles the rest: DAG construction, estimator dispatch, diagnostic checking, issue detection, and audit-trail generation. The human stays where humans are irreplaceable — judging whether the causal narrative makes sense — while the machine handles what machines do well: mechanical, repeatable, auditable computation. Who is this for? Researchers, data scientists, and analysts who need to make causal claims from observational data — and need those claims to be auditable, reproducible, and defensible. Whether you are running a randomized experiment in a tech company, estimating treatment effects in a clinical trial, evaluating a policy intervention, or building a macroeconomic transmission model, OpenCausality provides the governance layer that turns ad-hoc analysis into a structured, reviewable process. GitHub: https://github.com/LEE-CHENYU/OpenCausality