<a href="https://colab.research.google.com/github/urielgutieco/challenge_telecomx/blob/main/Challenge_Telecom_X.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Telecom X**

##**Analisis de Evasion de Clientes**

Desafios:
- Detectar los factores que llevan a la perdida de clientes ya que la tasa de cancelaciones.
- Recopilar
- Procesar
- Analizar datos
- Uso de diferentes bibliotecas
- Realizar modelos predictivos
- Desarrollar estrategias para disminuir la tasa de cancelacion

Actividades a practicar:
- Importar y manipular datos desde una API de manera eficiente.
- Aplicar los conceptos de ETL (Extracción, Transformación y Carga) en la preparación de los datos.
- Crear visualizaciones estratégicas para identificar patrones y tendencias.
- Realizar un Análisis Exploratorio de Datos (EDA) y generar un informe con insights relevantes.



###**Diccionario de Datos**

- customerID:  número de identificación único de cada cliente
- Churn:  si el cliente dejó o no la empresa
- gender:  género (masculino y femenino)
- SeniorCitizen:  información sobre si un cliente tiene o no una edad igual o mayor a 65 años
- Partner:  si el cliente tiene o no una pareja
- Dependents:  si el cliente tiene o no dependientes
- tenure:  meses de contrato del cliente
- PhoneService:  suscripción al servicio telefónico
- MultipleLines:  suscripción a más de una línea telefónica
- InternetService:  suscripción a un proveedor de internet
- OnlineSecurity:  suscripción adicional de seguridad en línea
- OnlineBackup:  suscripción adicional de respaldo en línea
- DeviceProtection:  suscripción adicional de protección del dispositivo
- TechSupport:  suscripción adicional de soporte técnico, menor tiempo de espera
- StreamingTV:  suscripción de televisión por cable
- StreamingMovies:  suscripción de streaming de películas
- Contract:  tipo de contrato
- PaperlessBilling:  si el cliente prefiere recibir la factura en línea
- PaymentMethod:  forma de pago
- Charges.Monthly:  total de todos los servicios del cliente por mes
- Charges.Total:  total gastado por el cliente

##**Extraccion de Datos**


- Cargar los datos directamente desde la API utilizando Python.
- Convertir los datos a un DataFrame de Pandas para facilitar su manipulación.

In [2]:
import seaborn as sns
import plotly.express as px
from matplotlib import pyplot as plt
import pandas as pd
!pip install pydataset
from pydataset import data
import numpy as np
!pip install scipy
import scipy
!pip install scikit-learn
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.cluster import KMeans
from sklearn.impute import KNNImputer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, mean_absolute_error, mean_squared_error, r2_score

Collecting pydataset
  Downloading pydataset-0.2.0.tar.gz (15.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.9/15.9 MB[0m [31m84.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pydataset
  Building wheel for pydataset (setup.py) ... [?25l[?25hdone
  Created wheel for pydataset: filename=pydataset-0.2.0-py3-none-any.whl size=15939415 sha256=04866d8402281698e8a7e760185027be8a4d44b91fcb93329f6c96bf7b25e90f
  Stored in directory: /root/.cache/pip/wheels/29/93/3f/af54c413cecaac292940342c61882d2a8848674175d0bb0889
Successfully built pydataset
Installing collected packages: pydataset
Successfully installed pydataset-0.2.0
initiated datasets repo at: /root/.pydataset/


In [3]:
data_frame = pd.read_json("https://raw.githubusercontent.com/ingridcristh/challenge2-data-science-LATAM/refs/heads/main/TelecomX_Data.json")
df = pd.DataFrame(data_frame)
data_frame.head()

Unnamed: 0,customerID,Churn,customer,phone,internet,account
0,0002-ORFBO,No,"{'gender': 'Female', 'SeniorCitizen': 0, 'Partner': 'Yes', 'Dependents': 'Yes', 'tenur...","{'PhoneService': 'Yes', 'MultipleLines': 'No'}","{'InternetService': 'DSL', 'OnlineSecurity': 'No', 'OnlineBackup': 'Yes', 'DeviceProte...","{'Contract': 'One year', 'PaperlessBilling': 'Yes', 'PaymentMethod': 'Mailed check', '..."
1,0003-MKNFE,No,"{'gender': 'Male', 'SeniorCitizen': 0, 'Partner': 'No', 'Dependents': 'No', 'tenure': 9}","{'PhoneService': 'Yes', 'MultipleLines': 'Yes'}","{'InternetService': 'DSL', 'OnlineSecurity': 'No', 'OnlineBackup': 'No', 'DeviceProtec...","{'Contract': 'Month-to-month', 'PaperlessBilling': 'No', 'PaymentMethod': 'Mailed chec..."
2,0004-TLHLJ,Yes,"{'gender': 'Male', 'SeniorCitizen': 0, 'Partner': 'No', 'Dependents': 'No', 'tenure': 4}","{'PhoneService': 'Yes', 'MultipleLines': 'No'}","{'InternetService': 'Fiber optic', 'OnlineSecurity': 'No', 'OnlineBackup': 'No', 'Devi...","{'Contract': 'Month-to-month', 'PaperlessBilling': 'Yes', 'PaymentMethod': 'Electronic..."
3,0011-IGKFF,Yes,"{'gender': 'Male', 'SeniorCitizen': 1, 'Partner': 'Yes', 'Dependents': 'No', 'tenure':...","{'PhoneService': 'Yes', 'MultipleLines': 'No'}","{'InternetService': 'Fiber optic', 'OnlineSecurity': 'No', 'OnlineBackup': 'Yes', 'Dev...","{'Contract': 'Month-to-month', 'PaperlessBilling': 'Yes', 'PaymentMethod': 'Electronic..."
4,0013-EXCHZ,Yes,"{'gender': 'Female', 'SeniorCitizen': 1, 'Partner': 'Yes', 'Dependents': 'No', 'tenure...","{'PhoneService': 'Yes', 'MultipleLines': 'No'}","{'InternetService': 'Fiber optic', 'OnlineSecurity': 'No', 'OnlineBackup': 'No', 'Devi...","{'Contract': 'Month-to-month', 'PaperlessBilling': 'Yes', 'PaymentMethod': 'Mailed che..."


In [4]:
# Aplanar la columna anidada 'customer'
customer_df = pd.json_normalize(df['customer'])

# Aplanar la columna anidada 'phone'
phone_df = pd.json_normalize(df['phone'])

# Aplanar la columna anidada 'internet'
internet_df = pd.json_normalize(df['internet'])

# Aplanar la columna anidada 'account'
account_df = pd.json_normalize(df['account'])

# Combinar el dataframe original con los data frames aplanados
data_frame = pd.concat([df[['customerID', 'Churn']], customer_df, phone_df, internet_df, account_df], axis=1)

df_t = data_frame
# Mostrar las primeras filas del dataframe transformado
display(data_frame.head())

Unnamed: 0,customerID,Churn,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,...,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,Charges.Monthly,Charges.Total
0,0002-ORFBO,No,Female,0,Yes,Yes,9,Yes,No,DSL,...,Yes,No,Yes,Yes,No,One year,Yes,Mailed check,65.6,593.3
1,0003-MKNFE,No,Male,0,No,No,9,Yes,Yes,DSL,...,No,No,No,No,Yes,Month-to-month,No,Mailed check,59.9,542.4
2,0004-TLHLJ,Yes,Male,0,No,No,4,Yes,No,Fiber optic,...,No,Yes,No,No,No,Month-to-month,Yes,Electronic check,73.9,280.85
3,0011-IGKFF,Yes,Male,1,Yes,No,13,Yes,No,Fiber optic,...,Yes,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,98.0,1237.85
4,0013-EXCHZ,Yes,Female,1,Yes,No,3,Yes,No,Fiber optic,...,No,No,Yes,Yes,No,Month-to-month,Yes,Mailed check,83.9,267.4


In [5]:
df_t.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7267 entries, 0 to 7266
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7267 non-null   object 
 1   Churn             7267 non-null   object 
 2   gender            7267 non-null   object 
 3   SeniorCitizen     7267 non-null   int64  
 4   Partner           7267 non-null   object 
 5   Dependents        7267 non-null   object 
 6   tenure            7267 non-null   int64  
 7   PhoneService      7267 non-null   object 
 8   MultipleLines     7267 non-null   object 
 9   InternetService   7267 non-null   object 
 10  OnlineSecurity    7267 non-null   object 
 11  OnlineBackup      7267 non-null   object 
 12  DeviceProtection  7267 non-null   object 
 13  TechSupport       7267 non-null   object 
 14  StreamingTV       7267 non-null   object 
 15  StreamingMovies   7267 non-null   object 
 16  Contract          7267 non-null   object 


In [6]:
df_t.dtypes

Unnamed: 0,0
customerID,object
Churn,object
gender,object
SeniorCitizen,int64
Partner,object
Dependents,object
tenure,int64
PhoneService,object
MultipleLines,object
InternetService,object


In [8]:
pd.unique(df_t['Charges.Total'])

array(['593.3', '542.4', '280.85', ..., '742.9', '4627.65', '3707.6'],
      dtype=object)

##**Carga y Analisis**

##**Informe Final**