In [1]:
import tensorflow as tf
from tensorflow.keras.models import load_model
import pickle 
import pandas as pd
import numpy as np

🔹 Step-by-Step Explanation:

* import tensorflow as tf
➡️ Loads TensorFlow — needed to use Keras models (for prediction).


* from tensorflow.keras.models import load_model
➡️ Imports the load_model function to load your trained ANN model saved earlier.


* import pickle
➡️ Loads the pickle module — used to load saved encoders or scaler (LabelEncoder, OneHotEncoder, StandardScaler, etc.).


* import pandas as pd
➡️ Imports Pandas to work with data (CSV, DataFrame, etc.).


* import numpy as np
➡️ Imports NumPy, used for numerical operations (arrays, reshaping input data, etc.).



In [8]:
## Load the trained model, scaler pickle,onehot
model=load_model('model.h5')

## Load the encoder and scaler
with open('onehot_encoder_geo.pkl','rb') as file:
    label_encoder_geo=pickle.load(file)

with open('label_encoder_gender.pkl','rb') as file:
    label_encoder_gender=pickle.load(file)

with open('scaler.pkl','rb') as file:
    scaler=pickle.load(file)




✅ Code Explanation:

model = load_model('model.h5')
➡️ Loads your trained ANN model (previously saved) from the file model.h5.
You’ll use this to make predictions.


* with open('onehot_encoder_geo.pkl','rb') as file:

    label_encoder_geo = pickle.load(file)
➡️ Loads the saved OneHotEncoder for the "Geography" column.
(It should actually be named onehot_encoder_geo, not label_encoder_geo — this may be a typo.)


* with open('label_encoder_gender.pkl','rb') as file:

    label_encoder_gender = pickle.load(file)
➡️ Loads the LabelEncoder used to encode "Gender".


* with open('scaler.pkl','rb') as file:

    scaler = pickle.load(file)
➡️ Loads the StandardScaler used to scale input features, so that new input data is processed the same way.

⚠️ Minor Fix:
In your OneHotEncoder loading line:


label_encoder_geo = pickle.load(file)  # ❌ Wrong name
should be:


onehot_encoder_geo = pickle.load(file)  # ✅ Matches what you originally saved

In [4]:
## Example input data
input_data = {
    'CreditScore': 600,
    'Geography': 'France',
    'Gender': 'Male',
    'Age': 40,
    'Tenure': 3,
    'Balance': 60000,
    'NumOfProducts': 2,
    'HasCrCard': 1,
    'IsActiveMember': 1,
    'EstimatedSalary': 50000
}

In [5]:
## One-hot encode 'Geography' 
geo_encoded=label_encoder_geo.transform([[input_data['Geography']]]).toarray()
geo_encoded_df=pd.DataFrame(geo_encoded,columns=label_encoder_geo.get_feature_names_out(['Geography']))
geo_encoded_df



Unnamed: 0,Geography_France,Geography_Germany,Geography_Spain
0,1.0,0.0,0.0


In [6]:
input_df=pd.DataFrame([input_data])
input_df

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
0,600,France,Male,40,3,60000,2,1,1,50000


🧠 What It Does:
* Converts your Python dictionary (input_data) into a DataFrame

* The outer [ ] ensures it's treated as one row of data

* input_df now looks like one row of a table — ready for preprocessing



In [10]:
## Encode categorical variables
input_df['Gender']= label_encoder_gender.transform(input_df['Gender'])
input_df

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
0,600,France,1,40,3,60000,2,1,1,50000


🔍 What It Does:
It uses the LabelEncoder you trained earlier to convert:

'Male' → 1

'Female' → 0

This makes the 'Gender' column numerical, so it can be used by the model.

✅ Output:
Your input_df now has "Gender" as a number, which is what your model expects.



In [11]:
## Concatenation with one hot encoded 
input_df= pd.concat([input_df.drop("Geography",axis=1),geo_encoded_df],axis=1)
input_df

Unnamed: 0,CreditScore,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Geography_France,Geography_Germany,Geography_Spain
0,600,1,40,3,60000,2,1,1,50000,1.0,0.0,0.0


 Step-by-Step Meaning:
Code Part:
* input_df	---> Your original DataFrame with one row (customer input).
* input_df.drop("Geography", axis=1)	---> Removes the 'Geography' column — because we just encoded it into numbers separately.
* .drop(...)	---> Used to remove a column or row from a DataFrame.
* axis=1	---> Means "drop column" (axis 0 = row, axis 1 = column).
* geo_encoded_df	---> This holds the one-hot encoded version of 'Geography' (e.g., France, Germany...).
* pd.concat([...], axis=1)	---> Joins two DataFrames side-by-side (column-wise).
* [input_df.drop(...), geo_encoded_df]	---> This list contains the two DataFrames we want to join together.
* input_df = ...	---> Updates input_df with the new combined version (without old Geography, with new encoded).

💡 Why We Do This:
We remove the original string column (Geography) because machine learning models can't work with text directly. Instead, we replace it with the numerical columns (e.g., Geography_France, Geography_Germany, Geography_Spain) created from one-hot encoding.

✅ Result:
Now input_df is fully numeric and has the same structure as your training data.





In [12]:
## Scaling the input data
input_scaled= scaler.transform(input_df)
input_scaled

array([[-0.53598516,  0.91324755,  0.10479359, -0.69539349, -0.25781119,
         0.80843615,  0.64920267,  0.97481699, -0.87683221,  1.00150113,
        -0.57946723, -0.57638802]])

🔍 Word-by-Word Breakdown:
Part:
* input_scaled	--> New variable to store the scaled version of input data.
* =	--> Assigns the result (on the right side) to the variable (on the left).
* --> scaler	This is the StandardScaler object you loaded with pickle. It contains the mean and std from training data.
* .transform(...)	--> Applies the same scaling rules (mean = 0, std = 1) to new data.
* input_df	---> The one-row customer data that is now fully numeric and encoded.

💡 Why We Do This:
During training, you scaled all input features so that they’re on the same scale (helps ANN learn better). Now, we apply the same scaling to the new input before prediction.
If you skip this step, the model will perform poorly — because the new data won’t match the scale it learned on.

In [13]:
##Predict churn
prediction=model.predict(input_scaled)
prediction

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 242ms/step


array([[0.15085508]], dtype=float32)

In [14]:
predictiion_prob = prediction[0][0]
predictiion_prob

0.15085508

In [15]:
if predictiion_prob > 0.5:
    print('The customer is likely to churn.')
else:
    print('The customer is not likely to churn.')

The customer is not likely to churn.
