FEATURES / CAUSAL INFERENCE

Why did the metric move?

DORA tells you cycle time dropped by 14% last quarter. It doesn't tell you why. Gitrevio's causal layer answers the second question — with a documented DAG, an identified estimand, a refutation test, and a posterior distribution that's calibrated against your own history.

The methods, named

DoWhy + EconML scaffolding
Four-step pipeline — DAG construction, identification, estimation, refutation — wrapped per metric. Every causal claim is backed by a documented graph and a placebo / leave-one-out / random-common-cause refutation test.
Cause-effect decomposition
A factor registry per metric (DORA + analytics). When a metric moves, the engine attributes the move across registered factors with confidence intervals.
Synthetic control (Abadie 2010)
Counterfactual simulator builds a weighted donor pool to estimate what the metric would have been absent the intervention. Difference-in-differences fallback when donor weights are unstable.
BOCPD (Adams-MacKay 2007)
Bayesian online change-point detection with calibrated run-length posteriors. Catches regime shifts without thresholds you'd have to hand-tune.
Cox proportional hazards + Kaplan-Meier (1958)
Survival analysis on time-to-merge and time-to-resolve. Per-team hazard ratios with proportional-hazards assumption tests.
Doubly-robust AIPW
Cohort lift modelling with propensity-score matching (Rosenbaum-Rubin 1983) and AIPW. Robust to misspecification of either the outcome or the propensity model.
Bayesian network
Joint posterior over engineering outcomes — delivery, quality, attrition. Lets you query conditional probabilities, not single-point predictions.
LinUCB policy search
Contextual bandit over process recommendations. Recommendations are tracked over time so the policy improves with feedback.
Kalman filter (Joseph-form)
Smoothed metric series with MLE-tuned noise covariance. Joseph-form for numerical stability on long horizons.
Anomaly → root-cause traceback
Anomaly detection chains through the metric dependency graph to surface the upstream signal that caused the alert.

A worked example

Lead time dropped 18% in March. The cause-effect decomposition partitions the drop across registered factors. The synthetic-control simulator estimates the counterfactual. BOCPD locates the change point.

# skill: cause_effect_decomposition
$ gitrevio skill run cause_effect_decomposition \
--metric lead_time_p50 --window 2026-03-01..2026-03-31
→ baseline: 4.2d observed: 3.4d delta: -18%
→ attributed factors (sum = -18.1%):
- reviewer_assignment_latency: -9.2% (CI: -11.4, -7.0)
- pr_size_distribution: -5.1% (CI: -6.8, -3.4)
- ci_pipeline_p95: -2.4% (CI: -3.1, -1.7)
- residual: -1.4%
→ change-point: 2026-03-09 (BOCPD posterior 0.94)
→ refutation: placebo p=0.81, LOO stable, common-cause stable

Decisions, not correlations

Correlation dashboards stall in the executive review. "Cycle time and team size both dropped" is not a finding. A causal estimate with a refutation test is.

Every estimate ships with its DAG and its refutation result. Reviewers can challenge the graph; the engine will re-run identification under the alternative.

Calibrated, not pseudo-Bayesian

Posterior intervals are calibrated against your own history. The Kalman filter's noise covariance is tuned by MLE on your data; BOCPD's hazard prior is re-fit each month.

Predictive intervention scoring uses Bayesian inverse-variance pooling against a prior catalog of historical interventions across the customer base — your data weighted heavier, the prior catalog smoothing low-signal regimes.

Ready to See Your Engineering work clearly?

Get started free