The second part of any retention program should determine the causes of churn. There is a difference between accounting for attributes that predict churn and the actual causes of churn or latent attrition. 

An example of this is customer dissatisfaction. We could have a Net Promoter Score (NPS) attribute in the churn predictor model but what are the causes of dissatisfaction?

I will use a recent example: an airline that I frequently fly introduced a new "economy" price where we do not have the option to pick our seats up front. Well to keep things short, this policy caused major consternation for us as a family. If you take my NPS score, it might show up as low.  Since we are dealing with a non-contractual setting, I will most likely exhibit latent churn. 

New pricing scheme -> Dissatisfaction -> Latent Churn 

So if I am this airline, I would need to pay attention and ask: 

  1. What are the possible causes of dissatisfaction? 
  2. Have we have introduced changes that can cause increased dis-satisfaction? 

So how do we solve for the causes of churn?

The subject of causation is a deep topic by itself. It is, however, important to underline that causal machine learning is an active research area. Even though far more work has been done on predictive models than causal ones, the good news is that organizations have an opportunity to lay the foundation for causal impact analysis. And we can build on tried and tested methodologies. 

Before getting into recommendations, I wanted to provide insights into the types of analysis that are possible. One approach is:

  • Explore first and exploit later 

This approach breaks the analysis into two sequential phases. The explore phase where the data is analyzed (exploit) but we do not make any decision or take an action (exploit). Randomized Control Trials (RCTs) widely regarded as the gold standard for inference fall in this category.

The other is:

  • Explore and exploit together ("learn as you go models")

These models assume that we will make decisions with the data points we have. If new information is introduced, we will adjust accordingly. 

What does that mean to you, a non-data scientist?

Our recommendation for building causal based decision-making culture involves the following stages:

Foundational: strengthening causal core 

  • Start with observational metrics that can give a sense of correlation. But as we all know, we cannot infer causation from correlation.
  • Create a shared culture of diagraming cause and effect states. We want to get into the habit of thinking and documenting reasons for outcomes. And also think in terms of what is known as counterfactual thinking.
  • Build a Randomized Controlled Trial (RCT) culture: Use Randomized Controlled Trial (RCT) based experiments where possible 

Building and toning the causal muscles

  • Build causal muscles that utilize multiple techniques such as the "learn as you go" models. Again, the learn as you go models make a decision with the data points that they have. And continue to learn and adjust to reflect the revised state of the world. Examples of such concepts are:
  • One approach might be to combine Bayesian time series with synthetic controls and counterfactuals  [3, Inferring causal impact using Bayesian structural time-series models]
  • We could use analysis from biostatistics. Ascarzae et al highlight analysis from biostatistics where churn is equated with an event such as death and competing causes of the churn event are analyzed to determine the cause. Yes a macabre analogy but applicable [4].
  • [4] Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26(11):2389-- 243
  • Multi-arm bandit approaches

The key takeaway is to review and refine. Causal analysis and ML is an evolving field with exciting research underway. So stay tuned and in touch with the state of the art. And yes some of these problems can be solved with good old statistical modeling tools such as R.

And of course, do not lose sight of the impact on the bottom line and Expected LTV.