# What Is A Root Cause?
A root cause is the underlying or core cause of a problem.

# What Is Root Cause Analysis (RCA)?
- A root cause analysis is a systematic process for identifying the underlying causes of a problem or an issue.
- It is defined as a collective term that describes a wide range of approaches, tools and techniques used to uncover the causes of a problem.
- It involves breaking down the problem into smaller parts, examining each part in detail and identifying the root causes that are contributing to the problem.

The easiest way to understand the root cause of a problem is to think about common problems.

Examples,
- If a person falls ill, he'd go to a Doctor and describe his or her symptoms to the Doctor, and the Doctor inturn would find the root cause of the illness.
- If the car stops working, a mechanic would find the root cause of the problem.
- Same would be the case in a business, if the business is underperforming, a person or a group of people will be tasked to find the root cause of the underperformance.

# Goal Of RCA
- The first goal of RCA is to discover the root cause of a problem or an event.
- The second goal is to fully understand how to fix, compensate or learn from any underlying issues within the root cause.
- The third goal is to apply whatever has been learnt from the analysis to systematically prevent future issues or to repeat successes.

Within an organization, problem solving, incident investigation and root cause analysis are all fundamentally connected by 3 basic questions,
1. What is the problem?
    - Define the issue or the problem by its impact on overall goals.
2. When did it occur?
    - List the potential factors that could cause the problem.
    - Investigate each one of them and determine the root cause of the problem.
3. What should be done to prevent it from happening again?
    - Prevent any negative impact by selecting the best solution.

# Life Cycle Of RCA
1. Clarify:
    - Ask questions to get enough clarity on the problem.
    - Clarify terms and set up parameters for discussion.
    - Create an outlier of approach that will be followed.
2. Rule out:
    - Explore the possibilities and list some high level causes.
    - Discard issues that seems to be out of scope.
        - Check the underlying data to make sure it is accurate.
        - Make sure there are no technical issues, glitches, bugs or outliers.
    - List the observations and start diagonising the root cause.
3. Internal factors:
    - Consider any recent features or products that were launched.
    - Look for any recent changes made by other teams.
    - Slice and dice the data into segments based on user demographics.
4. External factors:
    - Look for any bad PR or controversy related to the company.
    - Look for any changes in user behavior customer trends.
    - Look for macroeconomic or geographical changes.
    - Conduct a cometitor analysis.

Based on the analysis a point of issue or error will eventually be found that can be further investigated based on its type. A conclusion can later be reached on how to address the problem.

Note that all the findings should be captured and reported. The size of the finding can be small to large, in other words all the findings insignificant of their size should be captured.

# Product Diagnostics (Analyzing A Metric Change)
### Example case
How should the investigation go about if the percentage of users who clicked on a search result about a Facebook event increased 15% week-over-week.

General framework (CRIED)

### Clarify
Ask clarifying questions and share the thoughts about it. Below is an example of how to drive a discussion with an interviewer.

What does a search result for an event mean?
- Does it refer to when a user searches for something in the search bar on Facebook and the results produce a Facebook event?
- These search results could belong to different categories like a Facebook event, page or group and success is defined when a user clicks on an event.

The definition of the metric in the question can also be clarified upon,
- 15% increase = $\frac{\text{Number of users who clicked on the event result after searching}}{\text{Number of users who searched for any keyword}}$.
- 15% week-on-week = 15% increase in success rate compared to last week? or there has been a 15% increase over the past few weeks.

### Rule out
Rule out any change in the metric happening due to technical issues or infrastructural glitches or bugs or outliers.
- Has there been any bug in the logging code because of which event clicks have been duped?
- Is there a 3rd party software tracking the search result clicks? If so, is there any glitch in that software?
- Any failures in the pipeline?
- Did the metrics for the week get affected by one day's data alone or has it been a consistent increase? (Outlier analysis).

### Internal data
Explore the internal factors that could have affected the metric.

Acronym: TROPiCS
- Time: Is this 15% increase seasonal, sudden or gradual?
    - Sudden increase: Could mean there is a bug in the logging of a new feature, or update that's recently launched (ranking change?) that is creating problems so there may be a need to roll back.
    - Gradual increase: May indicate a change in user behavior. Maybe users are starting to prefer live virtual events over physical events due to COVID restrictions.
- Region:
    - Is this change concentrated in a specific region or is it evenly distributed globally?
    - For example, after the pandemic came to an end, some cities started to reopen. In which case, the rising interest in events may only be concentrated in those cities that are not re-opened.
- Other realted features affected: If an interest in events is going up, is a similar jump seen in Instagram or Facebook stories because users attending these events will have more content to post about?
- Platform:
    - Is this being seen across both Android and iOS devices?
    - Is this being seen across mobile and desktop devices?
    - Is this being seen across Windows and Mac?
    - If only one of them is seeing an increase, then investigation should be carried out in finding an engineering bug on the platform where the glitch is seen on.
- Cannibalization: If the metric for a product is decreasing it is because another product is cannibalizing the engagement? Alternatively, if the metric in question is increasing, is the current product, where the increase is seen in, cannibalizing from the other product?
    - Around the time when the spike in the event click is seen, is there a decrease in number of clicks on profiles or pages or groups?
    - Is there a specific category that the current product is cannabalizing from or is it evenly distributed?
    - For instance, is it only users that previously clicked on Groups (not pages) that are clicking on events now?
        - This may indicate that a change is made to the ranking of groups in the search results.
        - Was the product down-ranked? Or accidentally remove it completely?
- Segmentation: Slice and dice the data to identify the demographic of users this increase has affected.
    - Age: Is the increase only being noticed among teenagers, young adults, middle age or senior users?
    - Gender: Is this increase only among female users or across both genders?
    - Power v. casual users: Is this only observed among the frequent users or causal users or both?
    - New v. existing users: Is the increase only observed among the users that have newly joined the platform or the existing users or both?

### External data
- Did the number of users attending events on Twitter (X) decrease?
- Has a new competitor joined the market?
- Are the competitors changing their offering?
- Good PR.
- It could also be due to seasonality or a major temporary event. If it is a major temporary event, the KPIs will return to their normal state shortly.

# Case Study: Myntra
### Problem statement
Myntra has observed a decline from 5% to 3% in the number of confirmed orders over the last 3 days.

RCA has to be performed to help diagnose the issue.

### Data
Clickstream data is the information collected about a user while they browse through a website or use a web browser.

The kind of data collected,
- Whether the individual a first time ot a repeat visitor to the website.
- The terms an individual plugs into a search engine.
- What page the individuals lands on first.
- The amount of time a user spends on a page.
- The feature on the page the user clicks on and engages with.
- When and where an item is added or removed from a cart.
- Where the user goes next.
- When does the user use the back button.

Clickstream data collected from a single session of a user interacting with a websote may not be useful. However, an organization can use aggregated data gathered from many visitors to improve its website or service.

For example, if a lot of visitors leave a site after landing on a page with too little information, the organization may need to enhance the page with more valuable information. Likewise, if visitors often land on a page that isn't the website's homepage, then the organization may want to redesign that page to be more inviting, and informative to users.

### Benefits of clickstream data analysis
There are a number of benefits organizations can get from clickstream data, and clickstream analytics. Among them are the following,
- User information: The data collected can include search terms used, pages landed on, webpage features used, and the addition or removal of items from a cart, all of which can lead to more actionable insights.
- User routes: Organizations can use data analysis to view the different routes their online visitors or customers take to reach a page or to make purchase.
- Customer trends and insights: Collecting and analyzing the clickstreams of large number of visitors lets an organization identify the following areas,
    - How visitors get to the website?
    - What they do once they are there?
    - How long they stay on a page?
    - The number of page visits visitor make, and,
    - The number of unique and repeat visitors.
- UX: If a majority of users quickly leave a page or website, it could be a sign that the page is poorly optimized or does not contain enough information or the information is not of value. Clickstream data enables an organization to recognize UX shortcomings, enabling them to make necessary changes.
- Digital marketing: Clickstream data can be used to determine the amount of traffic coming from ad banners and campaigns. Such data provides insights as to which advertisments are most effective and lead to customer conversion rate optimization. Clickstream analysis can also derive what time of the day, month or year a marketing strategy is most effective.

### Establishing basic assumptions
- Question: How are cart additions defined?
    - A user can click on a particular item on the product page and adds it to their shopping list called the "cart".
    - They can later view the items in their cart and also add or remove items to or from it.
    - This is called as cart addition.
- Question: How are conversion metrics defined?
    - At myntra, the metric that is observed is the number of orders per session.
    - Orders per session is defined as, $\frac{\text{Number of orders placed in a day}}{\text{Total number of sessions accessed that day}}$.
    - One order can have multiple items.
- Question: How is a session defined?
    - A user session is defined as the browsing period between a login and the respective logout over the application.
    - One session can have multiple orders or cart additions.
- Question: What is a bounce session?
    - A user session in which the user landed only on a single page (Homepage) of the application and then leaves is known as a bounce session.
    - Bounce rate is calculated as, $\frac{\text{Number of bounce sessions in a day}}{\text{Total number of sessions accessed that day}}$.

### Creating a rough estimate
- Total number of session = 100.
    - Assuming that the average number of sessions per day after the decline was observed were around 100.
- Bounce rate = 10%.
    - Generally the bounce rate of an app is around 10%.
- Non-bounced sessions = 100 - (10% of 100) = 90.
- Sessions with product page view,
    - Around 1/3rd of the browsing sessions ends onto the product page = 33% of non bounced sessions = 30 (approx.).
- Sessions with add to cart or buy now = 15.
    - In 50% of the cases, a user viewing a product added to their cart.
- Sessions with checkout = 12.
- Sessions with payments = 6.
- Sessions with confirmed orders = 5.
    - Therefore, the conversion rate is 5%.

### Backtracking the user journey
Since the conversion rate is down by 2%,
- There could be a drop in total number of user sessions per day,
- There could be an increase in the bounce rate of the platform.
- Users might not proceed past the product page if they do not like the product.
- Users might not be able to add products to their shopping cart.
- Users might be having some issues during the checkout processor while making payments.

Consider the following according to the observations,
- There is no decline in the overall user activity. The user traffic on the platform is almost the same as usual.
- The bounce rate of the platform is also stagnant at 10%.
- For the product page,
    - The average product rating for different categories varies between 3.8 to 4.2
    - The expected delivery date of the products is sround 5 to 7 days.
    - The discount offered across multiple categories is from 20 to 45%.
- These numbers are also unchanged since the previous week.

It means that the issue lies somewhere in the last 3 stages of the user journey i.e.,
1. Add to cart.
2. Checkout.
3. Make payment.

### Asking clarifying questions
- External factors:
    - Question: Specifically talking about the Indian marketplace, Myntra does have a few e-commerce companies as their competitors like Ajio. Have any of them announced an upcoming sale or something?
        - There is limited knowledge about what the competitors are doing. Hearing from the customers and looking at their social media handles, there is no such activity observed.
        - But this is not clear entirely. Meaning, there could be some customer targeting or campigns that the competitor is running.
- Demographic details:
    - Question: Has this decline affected customers from a particular region or city?
        - No, there is no such activity observed in any of the locations.
    - Question: Is the decline seen in a specific gender or age group?
        - No significant deviation is observed in any of the groups.
- Macro-economic changes:
    - Question: Has there been any recent partnership changes or has any merchant just backed out? Are the products currently out of stock due to some problem in the supply chain?
        - There is no such event like the ones mentioned observed.
        - There is also enough stock of the majority of the products listed on the platform.
- Product specific changes:
    - Question: Has there been any recent changes or upgrades on the application?
        - Yes, certain upgrades have been deployed to the application in the last few days.

### Diagnosing the root cause
- Question: Is there any change with respect to the "add to cart" button? Are any of these changes done on the checkout page?
    - Due to the version upgrade, the complete shopping flow in the application is changed.
    - The checkout process has been improved to provide a seamless shopping experience to the customers.
    - However, the placement and design of "add to cart" button is left untouched.
- Question: Have there been any bug reports for the changes that were made?
    - There are not many bug reports seen, since the new update was launched only a few days ago.
- Question: Is there a major payment partner that has not been added to the platform?
    - Myntra facilitates payments through most of the leading payment merchants.
- Question: Are the payment gateways working properly?
    - There have been some complaints from a few customers who have faced certain issues while trying to make payment during checkout.

Explore the payment related issues in detail using the available data and analytics techniques.

![rca_1.png](attachment:rca_1.png)

### Listing out the conclusions
- It is observed that 3 out of 5 parterning banking services which provide payment options on Myntra are experiencing frequent issues and high server downtime in the current week.
- This validates the complaints that have been registered from the customers about incompelete transactions and payment failures.
- It is obvious that if a payment fails multiple times, the customers will not proceed with the transaction due to risk of losing their money.
- This will result in an automatic decrease in the amount of confirmed orders received over the ongoing week.
- However, as people are still adding items to their carts, it shows that there is no critical issue with respect to the application and hopefully everything will go back to normal as soon as the server issues are resolved.

# Case Study: Über
### Problem statement
- Scope: City to airport and airport to city transfer. There have been 2 complaints from the customer,
    1. High cancellation rate: Uber finds a driver, but the drive declines the ride request.
    2. Non-availability: Uber is not able to find a driver.
- Objective: Understand the reason behind both cancellation and non-availability.

### Analysis approach
1. Clarify (Define metrics):
    - $\text{Cancellation rate} = \frac{\text{Trips not completed}}{\text{Total trips requested}}$.
    - Trip not completed is if driver declined or no cab is found.
2. Rule out:
    - Driver level trend: If a group of driver driving the metric down.
    - Daily cancellation rate: See if high cancellation rate is a one off trend or a daily trend.
3. Internal data:
    - Tropics framework:
        - Time: Since when has the cancellation rate been high?
        - Region: Region is not applicable.
        - Other external factors: No information available.
        - Platforms: No information available.
        - Cannibalization: No information available.
        - Segment:
            1. Is the dirver declining or where there no cabs found?
                - Out of all the trips requested, only 42% are filled.
                - 58% trips are not filled:
                    - No cabs were found in 39% cases.
                    - Drivers declined the request in 19% of the cases.
            2. What part of the day is the highest cancellation rate observed and why?
4. Hypothesis formulation:
    - Cab unavailable: Supply demand gap (more requests than actual number of cabs).
    - Drivers declining: Drop location might be very far off and the driver might have to come with an empty cab.
5. Findings from the analysis:
    - Cab demand is high between 5AM to 9AM and then between 6PM to 9PM.
    - However, most of the rides are being cancelled in the mornings.
    - In the evenings, drivers are unavailable.

Maybe, people are booking city to airport in the morning. However, in morning airport to city demand is low. Driver fears he/ she might have to come empty, and will lose money. In morning, few cabs went from city to airport. As a result, when there is a demand in evening, no driver is available.

### Code
Uber has received some complaints from their customer facing problems related to ride cancellations by the driver and non-availability of cars for a specific route in the city.

The uneven supply-demand gap for cabs from the city to airport and vice-versa is causing a bad effect.

The aim the analysis is to identify the root cause of the problem and recommend ways to tackle the situation.

The following is what will be observed,
- Frequency of booking requests getting cancelled each hour.
- Pickup, and Destination of the cancelled booking requests.
- Days of the week in which the cancellation rate is maximum.
- Time of day during which the cancellation rate is at peak.
- Time of the day when the demand is highest, and supply is low.
- Time of the day when the cabs are available, but the demand is low.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

In [3]:
warnings.simplefilter("ignore")
sns.set_style("whitegrid")

In [4]:
# loading the data
df = pd.read_csv("uber.csv", parse_dates = [4, 5], dayfirst = True, na_values = "NA")
df.head()

Unnamed: 0,Request id,Pickup point,Driver id,Status,Request timestamp,Drop timestamp
0,619,Airport,1.0,Trip Completed,11/7/2016 11:51,11/7/2016 13:00
1,867,Airport,1.0,Trip Completed,11/7/2016 17:57,11/7/2016 18:47
2,1807,City,1.0,Trip Completed,12/7/2016 9:17,12/7/2016 9:58
3,2532,Airport,1.0,Trip Completed,12/7/2016 21:08,12/7/2016 22:03
4,3112,City,1.0,Trip Completed,13-07-2016 08:33:16,13-07-2016 09:25:47


In [5]:
# shape of the data
print("Number of rows:{}".format(df.shape[0]))
print("Number of columns:{}".format(df.shape[1]))

Number of rows:6745
Number of columns:6


In [7]:
# checking the data type
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6745 entries, 0 to 6744
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Request id         6745 non-null   int64  
 1   Pickup point       6745 non-null   object 
 2   Driver id          4095 non-null   float64
 3   Status             6745 non-null   object 
 4   Request timestamp  6745 non-null   object 
 5   Drop timestamp     2831 non-null   object 
dtypes: float64(1), int64(1), object(4)
memory usage: 316.3+ KB


In [8]:
# checking for null values
df.isnull().sum()/ len(df) * 100

Request id            0.000000
Pickup point          0.000000
Driver id            39.288362
Status                0.000000
Request timestamp     0.000000
Drop timestamp       58.028169
dtype: float64

In [11]:
# checking forduplicate rows
print("Number of duplicate rows: ", df.duplicated().sum())

Number of duplicate rows:  0
