lorem-ipsum

Typical software organizations constantly prioritize product features. One approach to prioritization is to weigh and classify features based on their impact on customer acquisition, retention and growth.

We also tend to have an idea about the benefits of the features for the end-user. For example, the release will:

reduce the time users spend on an activity
improve task completion rates
improved user satisfaction

Most software organizations are characterized by at least one of the following in their design and software development processes:

ideation exercises
usability studies
wireframing
paper mockups
focus groups
design sprints etc

But they are all characterized by small sample sizes in a lab and controlled setting.

Although an important step in the process, the "real world" or the "wild" is characterized by errors, engaged or disengaged users, lapsed users, delighted quiet users, squeaky wheel users who take up a lot of time, un-trained users, users who do not read instructions, etc. And last but not least, the wild has the entire user population: current users, the prospects and those who might churn.

Difference between the labd and the wild

Historically companies have used statistical techniques to extrapolate how the results from the smaller "lab" sample size will translate to "the wild" population. Although this is a valid approach, there are gaps in the process.

There is an analysis gap between:

user behavior in the lab
how users behave in the wild.
how user experiences connect to customer retention, growth and expected Life -Time Value.

What is needed is a feedback continuum between the lab and the wild to truly analyze user experiences and establish linkages to Expected Life Time Value, churn, retention and adoption metrics.

This blog will cover the following:

scenarios that highlight data-driven design process
predicting churn and retention through usability metrics
tangible steps that you can take within your organization.

continuous information flow between the lab and broader user base

Scenario 1: How long will it take the users to complete the task?

Background and the metric of interest

In my previous company, we had the following task completion time metric. The actual values have been modified to protect the innocent. The numbers are used for illustrative purposed only:

"We wanted our users to take X minutes to complete a media plan with specific attributes such as target goals, demographics, markets, formats, etc. For the sake of protecting the innocent, let us say that our target was 25 minutes and under. We compared this benchmark to the current state, say 60 minutes."

The task completion times were based on extensive current state analysis. This helped us establish a baseline to measure against.

The design process

Let us say that we are at the stage where the design team has produced the design. The team has worked with a sample of early design partners who have been providing input on the core functionality.

They have been working with these users to understand the tweaks, the frictions. The feedback loop has been valuable. The team is confident that they have a good design that will be liked and will reduce the time associated with creating a campaign.

Overall our confidence levels in our product designs tend to increase with user interactions. At t=0, when the idea is in our head, we could say that our confidence should be around zero or negative or in some cases it might be through the roof. But as we interact more with users, our confidence in our product tends to go up or might come down.

Driven by the feedback, we make relevant adjustments and refine our approaches/product. This is a great start! But the key term is that this confidence is not a statistical confidence. Just how we feel based on anecdotal evidence.

This "gut" or iterative confidence introduces a level of un-certainty that can cause products to fail, even when the problem statement has been properly identified.

Design confidence curve: our confidence goes up or down based on user interactions

Hence an important question we want to ask is:

How confident are we that our assumptions will hold "in the wild"?

The answer should be based on not how we feel but should be backed by numbers and sound math.

Will our proposed designs will actually be done in 25 minutes or less?
Will 25% of users complete the task in 25 minutes or less? Will it be 90%?

We might have design partners who are immersed in our process. They were part of product development.

But what about the vast majority of users who will experience the product without hand-holding?

In order to answer this question, we will need to run users through a task time analysis exercise. In this exercise, we would need to pick an appropriate sample size and run analysis to determine if the measured task completion times will hold true for a larger population.

The answer could go both ways: Our observations could end up being over 25 minutes or below 25 times. We could use this analysis to determine whether we need to further invest engineering time to refine the design or ship the release with confidence.

Task time analysis is a standard practice and there are multiple ways to execute on this. We are primarily interested in the approach where the product and the application self reports on the task completion times. It could be a SaaS, point of sales interfaces, mobile applications etc.

To successfully execute on this requires a team effort. This means designing and engineering the application in a way that it sends the appropriate metrics back, it means having an infrastructure that can support the collection and analysis of these metrics. And last but not least, it means having the data/statistical IQ to process the data to make appropriate recommendations.

Scenario 2: We have competing designs. Which one should we pick?

The second scenario is common and we face this decision in software development all the time. Staying with the example of task time analysis.

Let us say the engineering runs a preliminary analysis on the design and comes up with an estimate of 20 weeks. So the the team feels that there is a need to consider scaled down designs. The scaled down designs give us the following:

Design B: could be developed in half the time (10 weeks) but the time to complete metric could end up @ 45 minutes instead of 25 minutes and has the same aesthetic appeal as the original design.
Design C: Could be developed in half the time (10 weeks) with the time to complete metric at 25 minutes but might not have the same aesthetic appeal (at least this is what the team thinks)

Note: As is the case in most scenarios, we start with a hypothesis (whether we call it that or not). If a metric objective has been defined, we will come up with a solution to meet that requirement.

How does the team pick the design?

We will add another dimension to the situation:

How do users feel while completing the tasks. How satisfied are they post completion?

So now we have three design options Design A (the original), Design B, Design C.

comparing designs against engineering effort

This process should involve the following time to complete task analysis:

Execute a comparative design analysis across the three designs to measure task completion times. Depending on the number of participants you could have all the participants execute through all designs or you could keep them independent. The choices you make will influence statistical models chosen.
Answer the question: How confident are we that the proposed designs will actually be done within the anticipated times when in the wild?
Run surveys at the end of the exercise to gauge how the users feel when interfacing with the product. Maybe Scenario C works as well as Scenario A. We do not need to build the most beautiful application, functional has the same effect.
Answer the question: How statistically confident are we that the users in the wild will feel the same way as with the sample

Advantages of being data-driven

Both these examples underline an important point that even when we are done with our analysis, our data collection and analysis does not stop.

Let us say we picked our best scenario and we felt statistically confident that Scenario C was a winner. We build with the optimum effort and launch. With the right culture, infrastructure and data skillsets we will be able to monitor the performance of these metrics. We could subsequently decide to introduce new or revised features.

It could also turn out that our in the lab analysis was inconclusive. Inconclusive can also mean that all three options might have the same impact. In this scenario, the best option would be to pick the cheapest and quickest option and to keep learning from the data.

Connecting to churn and Expected Life Time Value (LTV)

So far the discussion has been focused at the level of a feature or capability. The bigger question is around: the relationship of the task completion times and customer retention.

task time completion to retention relationship

Our analysis should help us answer the following:

how well can we use a feature benefit such as improved task completion times to predict retention and churn?

Do the improved task completion times cause improvements in other churn predictors:

feature and retention predictors

Will it cause improved application engagement?
Will it result in improved frequency of transactions?
By extension, the relationship of task completion times cause improved retention. What about the relationship between task times and the cost to acquire a customer?

First reaction to connecting a product release to churn prediction and Expected Life Time Value (LTV) is that there are too many variables involved. Will improved transaction times have a "cause and effect" relationship with improved retention? What about other factors such customer service that we provide? Or the other features we have in the product? Do they not cause improved retention?

These are all great questions. And this is where having a strong statistical culture helps. Multiple industries have grappled with similar class of problems. There are statistical and ML models that help in such cases. And is a topic for another set of articles.

The great news and the most appealing part of being data driven and having predictive models is that we are always looking to fine tune our predictive models, establish cause and effect relationships and/or optimize on required objectives. This is a journey and involves shift in culture and skill.

How can you get started?

Develop the statistical skillset (data IQ) across your design, product, engineering and marketing teams.
Audit and re-evaluate your existing use cases or define a new one with an emphasis on tangible metrics and outcomes.
Identify data types and data sources with a plan to log metrics of interest.
Audit your release process and identify changes that would need to be made in the deployment process that aligns with the goal of establishing cause and effect with key business metrics such as Expected Life Time Value (LTV)
Define your high level data pipeline architecture from source to process
Pilot with one use case
Last but not least be open to failure and change!