## Assignment Title
   Customer Churn Prediction – Mini Machine Learning Project

## Objective
   Build a basic machine learning project to predict whether a customer will leave a service (churn) or not, using a publicly available dataset.

## Data Preparation

## 1. Import Required Libraries

In [31]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix



## 2. Load Dataset

In [32]:

data = pd.read_csv("/content/telecom_customer_churn.csv")
data.head()

Unnamed: 0,Customer ID,Gender,Age,Married,Number of Dependents,City,Zip Code,Latitude,Longitude,Number of Referrals,...,Payment Method,Monthly Charge,Total Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Total Revenue,Customer Status,Churn Category,Churn Reason
0,0002-ORFBO,Female,37,Yes,0,Frazier Park,93225,34.827662,-118.999073,2,...,Credit Card,65.6,593.3,0.0,0,381.51,974.81,Stayed,,
1,0003-MKNFE,Male,46,No,0,Glendale,91206,34.162515,-118.203869,0,...,Credit Card,-4.0,542.4,38.33,10,96.21,610.28,Stayed,,
2,0004-TLHLJ,Male,50,No,0,Costa Mesa,92627,33.645672,-117.922613,0,...,Bank Withdrawal,73.9,280.85,0.0,0,134.6,415.45,Churned,Competitor,Competitor had better devices
3,0011-IGKFF,Male,78,Yes,0,Martinez,94553,38.014457,-122.115432,1,...,Bank Withdrawal,98.0,1237.85,0.0,0,361.66,1599.51,Churned,Dissatisfaction,Product dissatisfaction
4,0013-EXCHZ,Female,75,Yes,0,Camarillo,93010,34.227846,-119.079903,3,...,Credit Card,83.9,267.4,0.0,0,22.14,289.54,Churned,Dissatisfaction,Network reliability


## 3. Handle Missing Values

In [33]:

print(data.columns)

Index(['Customer ID', 'Gender', 'Age', 'Married', 'Number of Dependents',
       'City', 'Zip Code', 'Latitude', 'Longitude', 'Number of Referrals',
       'Tenure in Months', 'Offer', 'Phone Service',
       'Avg Monthly Long Distance Charges', 'Multiple Lines',
       'Internet Service', 'Internet Type', 'Avg Monthly GB Download',
       'Online Security', 'Online Backup', 'Device Protection Plan',
       'Premium Tech Support', 'Streaming TV', 'Streaming Movies',
       'Streaming Music', 'Unlimited Data', 'Contract', 'Paperless Billing',
       'Payment Method', 'Monthly Charge', 'Total Charges', 'Total Refunds',
       'Total Extra Data Charges', 'Total Long Distance Charges',
       'Total Revenue', 'Customer Status', 'Churn Category', 'Churn Reason'],
      dtype='object')


## 4. Encode Categorical Variables

In [34]:
label_encoder = LabelEncoder()

for column in data.columns:
    if data[column].dtype == 'object':
        data[column] = label_encoder.fit_transform(data[column])


In [35]:
data = pd.get_dummies(data, drop_first=True)


## 5. Split Features and Target


In [36]:
X = data.drop('Customer Status', axis=1)
y = data['Customer Status']

In [37]:
data = data.drop('Customer ID', axis=1)

## 6. Train-Test Split

In [38]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

## Model Building

## 1. Logistic Regression

In [39]:
from sklearn.preprocessing import StandardScaler

# Identify numerical columns that might have NaNs and impute them
numerical_cols = X_train.select_dtypes(include=np.number).columns

for col in numerical_cols:
    if X_train[col].isnull().any():
        mean_value = X_train[col].mean()
        X_train.loc[:, col] = X_train[col].fillna(mean_value)
        X_test.loc[:, col] = X_test[col].fillna(mean_value) # Impute test set with train mean

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LogisticRegression(
    max_iter=5000,
    class_weight='balanced',
     C=0.5
)
model.fit(X_train_scaled, y_train);
 # Increased max_iter to ensure convergence

## Model Evaluation

## Prediction

In [45]:
from sklearn.metrics import accuracy_score, confusion_matrix

y_pred = model.predict(X_test_scaled)

## Accuracy

In [46]:
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.9687721788502484


## Confusion Matrix

In [47]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


Confusion Matrix:
 [[371   0   2]
 [  0  97   0]
 [  0  42 897]]


## Conclusion : This mini machine learning project successfully predicts customer churn using Logistic Regression.
It follows clean coding practices, uses standard ML steps, and produces interpretable results suitable for a fresher-level assignment.