Atalov S. (TSI AUCA)

---

# Telecom Churn Prediction

This task is devoted to a full-fledged solution to the problem of machine learning.

<div>
    <img src="https://media.licdn.com/dms/image/C5612AQG0Gilk9mJpxw/article-cover_image-shrink_720_1280/0/1621963349834?e=2147483647&v=beta&t=S-3_jI-4xWu14OSyh7RJg93TQmLc_QWTZAW5Gd_aL8s" width="400"/>
</div>


---

---
## 0. Problem Statement

About Company
DreamTelecom - a leading telecom company that is experiencing a customer churn problem.

#### Problem
You need to create a model that will predict customer churn so that we can offer interesting services to them in time.

In [1]:
# read the datafile
import pandas as pd
# import XGBoost
from sklearn.model_selection import GridSearchCV
df = pd.read_csv("train.csv")
for column in df.columns:
    print(column, df[column].unique())

ClientPeriod [72 13 33 18  1 68  8 48 15 53  7  6 61 43 69 11 17 26 62 22 59 41 60 19
 50 31  9  3 65 12 36  2  5 27 14 34 49 21 44 67 66  4 46 38 47 70 71  0
 58 20 56 52 64 25 10 29 57 63 39 54 35 45 32 55 42 23 24 37 16 30 28 51
 40]
MonthlySpending [98.65 84.9  44.   ... 62.55 62.3  97.65]
TotalSpent [7129.45 6065.3   659.35 ... 2354.8  6687.85 4156.8 ]
Sex ['Female' 'Male']
IsSeniorCitizen [0 1]
HasPartner ['Yes' 'No']
HasChild ['Yes' 'No']
HasPhoneService ['Yes' 'No']
HasMultiplePhoneNumbers ['Yes' 'No' 'No phone service']
HasInternetService ['Fiber optic' 'DSL' 'No']
HasOnlineSecurityService ['No' 'Yes' 'No internet service']
HasOnlineBackup ['Yes' 'No' 'No internet service']
HasDeviceProtection ['Yes' 'No' 'No internet service']
HasTechSupportAccess ['Yes' 'No' 'No internet service']
HasOnlineTV ['No' 'Yes' 'No internet service']
HasMovieSubscription ['Yes' 'No' 'No internet service']
HasContractPhone ['Two year' 'Month-to-month' 'One year']
IsBillingPaperless ['Yes' 'No']
Paym

## 1. Data Preprocessing

In [2]:
df = pd.get_dummies(df, columns=["HasMultiplePhoneNumbers", "HasInternetService", "HasOnlineSecurityService", "HasOnlineBackup", "HasDeviceProtection", "HasTechSupportAccess", "HasOnlineTV", "HasMovieSubscription", "HasContractPhone", "PaymentMethod"])
df["Sex"] = df["Sex"].replace({"Female": 0, "Male": 1})
df["HasPartner"] = df["HasPartner"].replace({"No": 0, "Yes": 1})
df["HasChild"] = df["HasChild"].replace({"No": 0, "Yes": 1})
df["HasPhoneService"] = df["HasPhoneService"].replace({"No": 0, "Yes": 1})
df["IsBillingPaperless"] = df["IsBillingPaperless"].replace({"No": 0, "Yes": 1})

In [3]:
df["TotalSpent"] = df["TotalSpent"].fillna(0)

In [4]:
X = df.copy()
y = X.pop("Churn")
from sklearn.model_selection import train_test_split as tts
X_train, X_test, y_train, y_test = tts(X, y, test_size=.1)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((3600, 40), (400, 40), (3600,), (400,))

## 2. Modeling

### Logistic Regression

In [10]:
# fit the model
from sklearn.linear_model import LogisticRegressionCV
estimator = LogisticRegressionCV()
l, r = 1, 3000
l, r = 126, 129
compatability = {"lbfgs": ["l2", None],
                "liblinear": ["l1", "l2"],
                "newton-cg": ["l2", None],
                "newton-cholesky": ["l2", None],
                "sag": ["l2", None],
                "saga": ["elasticnet", "l1", "l2", None]}
params = {"penalty": ["elasticnet", "l1", "l2", None],
          "solver": list(compatability.keys()),
          "max_iter": [-1, -1, -1, -1]}
params["max_iter"] = [round(l), round((2*l+r)/3), round((l+2*r)/3), round(r)]
grid_search = GridSearchCV(estimator=estimator,param_grid=params,n_jobs=-1)
grid_search.fit(X_train, y_train)

320 fits failed out of a total of 480.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
20 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\akylo\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\akylo\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\linear_model\_logistic.py", line 1762, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\akylo\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\linear_model\_logistic.py", li

In [11]:
while l!=params["max_iter"][0] or r!=params["max_iter"][-1]:
    params["max_iter"] = [round(l), round((2*l+r)/3), round((l+2*r)/3), round(r)]
    bests = []
    while max({bests.count(best) for best in bests}|{0})<3:
        grid_search = GridSearchCV(estimator=estimator,param_grid=params,n_jobs=-1)
        grid_search.fit(X_train, y_train)
        bests.append(tuple(grid_search.best_params_.values()))
    best = grid_search.best_params_
    m = params["max_iter"].index(best["max_iter"])
    print(grid_search.best_estimator_.score(X_test, y_test), best)
    if abs(m-1.5)<1:
        l, _, r = params["max_iter"][m-1:m+2]
    else:
        l, r = sorted([params["max_iter"][abs(m-1)], 2*params["max_iter"][m]-params["max_iter"][abs(m-1)]])

In [12]:
# show score on test data
log_model = LogisticRegressionCV(**grid_search.best_params_)
log_model.fit(X, y)
log_model.score(X_train, y_train), log_model.score(X_test, y_test)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


(0.8097222222222222, 0.815)

### Decision Tree Classifier

In [13]:
# fit the model




In [14]:
# show score on train data



In [15]:
# show score on test data



### Random Forest Classifier

In [16]:
# fit the model



In [17]:
# show score on train data



In [18]:
# show score on test data



In [6]:
from xgboost import XGBClassifier
estimator = XGBClassifier()
bounds = {
    "max_depth": [0, 10],
    "n_estimators": [0, 200],
    "max_leaves": [0, 10],
    "gamma": [0, 100],
    "min_child_weight": [0, 300],
    "max_delta_step": [0, 5],
}
params = {
    "booster": ["gbtree", "gblinear", "dart"],
}
params.update({param: [-1, -1, -1, -1] for param in bounds})
def expand(l, r):
    return [round(x) for x in [l, (2*l+r)/3, (l+2*r)/3, r]]
while not all(all(abs(bounds[param][i]-params[param][i])<2 for i in [0, -1]) for param in bounds):
    bounds = {param: [max(bound[0], 0), max(bound[1], 0)] for param, bound in bounds.items()}
    for param in bounds:params[param] = expand(*bounds[param])
    grid_search = GridSearchCV(estimator=estimator, param_grid=params, n_jobs=-1, cv=5)
    print(params)
    grid_search.fit(X_train, y_train)
    print(grid_search.best_params_)
    for param in bounds:
        best = grid_search.best_params_[param]
        gap = min(bounds[param][-1]-best, best-bounds[param][0])
        if not gap:
            gap = max(bounds[param][-1]-best, best-bounds[param][0])/3
        bounds[param] = [best-gap, best+gap]
    print(bounds)

{'booster': ['gbtree', 'gblinear', 'dart'], 'max_depth': [0, 3, 7, 10], 'n_estimators': [0, 67, 133, 200], 'max_leaves': [0, 3, 7, 10], 'gamma': [0, 33, 67, 100], 'min_child_weight': [0, 100, 200, 300], 'max_delta_step': [0, 2, 3, 5]}


## 3. Hyperparameter Tuning (Find Best Parameters)

(0.8172222222222222, 0.7825)

In [None]:
model

## 4. Predict Test Data

### Read and Prepare test data in the same way as was done above

### Make a prediction using your best model:

### Save predictions as `YourName.csv` and submit csv file and this notebook in ecourse

HINT: Use `df.to_csv('YourName.csv', index=False)`

In [None]:
import requests
responce = requests.get("https://tsiauca.edupage.org/?")
html = responce.text
html = html[html.find("<body")-1:html.find("</body>")-1]

In [None]:
l = [html.find("<")]
while (i := html.find("<", l[-1]+1))!=-1:
    l.append(i)
l.append(-1)
r = [0]
while (i := html.find(">", r[-1]+1))!=-1:
    r.append(i)
text = ""
for rr, ll in zip(l, r):
    text += html[ll+1:rr]*(html[rr:rr+9]=="</script>")
text += html[-1]
list(zip(l, r))

[(1, 0),
 (37, 31),
 (57, 56),
 (93, 92),
 (222, 218),
 (256, 244),
 (260, 259),
 (302, 289),
 (306, 305),
 (326, 310),
 (330, 329),
 (370, 359),
 (374, 373),
 (394, 378),
 (398, 397),
 (427, 421),
 (431, 430),
 (451, 435),
 (455, 454),
 (494, 482),
 (498, 497),
 (516, 502),
 (526, 520),
 (594, 593),
 (662, 661),
 (756, 755),
 (879, 878),
 (981, 980),
 (1031, 1030),
 (1119, 1118),
 (1344, 1343),
 (1412, 1411),
 (1481, 1480),
 (1551, 1549),
 (1558, 1557),
 (1565, 1564),
 (1571, 1570),
 (1575, 1574),
 (1580, 1579),
 (1630, 1629),
 (1698, 1697),
 (1768, 1767),
 (1865, 1857),
 (2509, 2502),
 (2622, 2615),
 (2635, 2628),
 (2683, 2665),
 (2695, 2689),
 (2708, 2698),
 (2746, 2738),
 (2837, 2836),
 (2860, 2859),
 (2865, 2863),
 (2876, 2868),
 (2968, 2967),
 (2991, 2990),
 (2996, 2994),
 (3009, 2999),
 (3100, 3099),
 (3124, 3123),
 (3128, 3127),
 (3138, 3131),
 (3150, 3143),
 (3156, 3155),
 (3255, 3249),
 (3259, 3258),
 (3265, 3264),
 (3351, 3348),
 (3414, 3408),
 (3531, 3530),
 (3592, 3591),
 

In [None]:
html

'\t<body id="docbody" class="   ">\r\n\t\t\t<div class="topDiv"><div class="edubarMainSkin nowmaxL"><div class="skgd composite sgmain" id="skin_main" data-sgid="skin_main"  data-templateid="skin" data-randid="skin_main-88655">\t\t\t<ul class="skip-links">\r\n\t\t\t\t\t\t\t\t\t<li><a href="#skin_FixedHeader_1">Верхнее меню</a></li>\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t<li><a href="#skin_PageContent_2">Содержание</a></li>\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t<li><a href="#skin_Login_1">Логин</a></li>\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t<li><a href="#skin_LegalInfo_2">Текст внизу</a></li>\r\n\t\t\t\t\t\t\r\n\t\t\t</ul>\r\n\t\t\t<div class="skgd composite" id="skin_Div_1" data-sgid="skin_Div_1" ><div class="skgd composite" id="skin_Div_2" data-sgid="skin_Div_2" ><div class="skgd composite skgdContainer" id="skin_Container_3" data-sgid="skin_Container_3" ><section class="skgd composite skgdSection visualEdit inlineEditable sgsp" id="skin_Section_2" data-sgid="skin_Section_2" ><ul class="skgd composite hbox respon

In [None]:
text

'\r\n\t\t\t$j(\'#skin_MenuB_2\').responsiveMenuSkinGadget({"editable":false});\r\n\t\t\t'