##  *Client Subscription Prediction*

## Business Understanding 
##### *Problem Statement*

The goal is to improve the success rate of direct marketing campaigns by identifying factors that influence a client's decision to subscribe to a term deposit. This involves understanding client demographics, campaign characteristics, and historical engagement.

##### *Business Objective*

Increase Conversion Rates: Identify clients more likely to subscribe to a term deposit to optimize marketing efforts.

Resource Allocation: Minimize unnecessary calls to unlikely prospects, reducing costs.

Customer Insights: Gain a deeper understanding of client profiles and preferences for future campaigns.

##### *Key Business Questions*

1. Which client demographics (e.g., age, job, marital status, education) are associated with a higher likelihood of subscribing to a term deposit?

2. How do campaign-related factors (e.g., number of calls, duration of contact) impact conversion?
3. What role does previous engagement and its outcome play in current campaign success?
4. Can we predict the likelihood of subscription   based on client and campaign data?

#### *Hypothesis Statement*:
#### H₀ (Null Hypothesis):
Client characteristics (e.g., age, job, marital status, education), financial details, and campaign attributes do not significantly influence the likelihood of subscribing to a term deposit.

#### H₁ (Alternative Hypothesis):
Client characteristics, financial details, and campaign attributes significantly influence the likelihood of subscribing to a term deposit.

## *Data Understanding*

### *Key Features*

##### *Client Demographics:*

##### Age (numeric): 
Provides insights into the age distribution of subscribers.
Job (categorical): Employment type may influence financial behavior.
Marital Status (categorical): Indicates household financial decisions.
Education Level (categorical): Reflects financial literacy and potential income levels.

#### Financial Information:

Default (binary): Indicates credit risk.
Balance (numeric): Average yearly balance offers a financial capability metric.
Housing Loan and Personal Loan (binary): Reflects current financial obligations.

#### Campaign Details:

Contact Method (categorical): Effectiveness of communication channels.
Day and Month of Contact: Timing trends for successful engagements.
Duration of Last Contact: Directly correlated with success (as longer calls often indicate interest).
Number of Contacts in Campaign (numeric): May indicate persistence levels.

#### Historical Campaign Data:

Days Since Last Contact (pdays): Recency of prior engagements.
Previous Contacts (previous): Frequency of earlier interactions.
Outcome of Previous Campaign (poutcome): Past success or failure trends.

#### Target Variable:

y (binary): Indicates whether the client subscribed to the term deposit (yes or no).


#### *Success Criteria*  
A Predictive model with high accuracy, precision and recall for identifying potential Subscriber 

###  Import Libraries

In [1]:
#data manipulation libraries
import pandas as pd
import numpy as np 

#data visualization libraries 
import matplotlib.pyplot as plt
import seaborn as sns 

#statistics libraries 
from scipy import stats
import statistics as stat
from scipy.stats import chi2_contingency,mannwhitneyu

#machine learning libraries 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,LabelEncoder,RobustScaler,OneHotEncoder,FunctionTransformer,PowerTransformer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV,StratifiedKFold
from sklearn.metrics import classification_report, confusion_matrix
from imblearn.over_sampling import RandomOverSampler
from imblearn.over_sampling import SMOTE, RandomOverSampler
from imblearn.pipeline import Pipeline as imbPipeline
from sklearn.feature_selection import SelectKBest ,mutual_info_classif
from sklearn.metrics import auc,roc_curve

import joblib 
import os


# hide warning
import warnings
warnings.filterwarnings('ignore')



In [None]:
# Load dataset
train_df = pd.read_csv('C:/Users/HP-PC/OneDrive - Azubi Africa/my azubi africa career accelerator projects/Predictive-Analysis-for-Client-Subscription/Data/bank-full.csv')
train_df 
