In [None]:
# %pip install pandas numpy openpyxl nbformat ipykernel plotly

In [None]:
from os import getcwd, path

import pandas as pd
import plotly.express as px

In [None]:
df = pd.read_csv(path.join(path.dirname(getcwd()), "data", "cancellations.csv"))

Display the database data and the structural information of that data.

In [None]:
display(df)
display(df.info())

Correcting "errors" and eliminating information that is irrelevant to the main
analysis.

* The CustomerID column, being randomly generated data, does not influence the number of cancellations. Therefore, it has been removed
* Indexes with empty data can interfere with the main analysis. Therefore, they were also eliminated

In [None]:
df = df.drop(columns="CustomerID")
df = df.dropna()

display(df.info())

Starting the main analysis by looking at the actual number of cancellations.

In [None]:
display(df["cancelou"].value_counts())
display(df["cancelou"].value_counts(normalize=True))

For each column in the table, create a graph (histogram) showing its direct
relationship with the "cancelou" column.  
The graphical visualization helps to map out the main causes of cancellations
in a practical way.

In [None]:
for column in df.columns:
    if column == "cancelou": continue
    graphic = px.histogram(df, x=column, color="cancelou")
    graphic.show()

**Main factors for cancellations:**

1. All customers with monthly contracts
2. Customers who call the call-center more than four times
3. Customers who are more than twenty days late

**Possible solutions:**

1. Offer discounts on quarterly and annual plans
2. Try to resolve the customer's problem before the fourth call to the call-center
3. Charge the customer before the twenty days of delay

**Implementing the solutions:**

In [None]:
df = df[df["duracao_contrato"] != "Monthly"]
df = df[df["ligacoes_callcenter"] <= 4]
df = df[df["dias_atraso"] <= 20]

display(df["cancelou"].value_counts())
display(df["cancelou"].value_counts(normalize=True))
display(df["cancelou"].value_counts(normalize=True).map("{:.1%}".format))

Previously, cancellations were 56.6%. Applying these solutions, the statistic
drops to 18.4%.