# Bank Marketing

The <b>bank-marketing.csv</b> data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (or not) subscribed. The ultimate goal is to predict if the client will subscribe to a term deposit (variable y). This is a classic classification problem where the attempt is to classify between two classes - those who'll subscribe and those who won't.

Dataset reference:
- S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. 
- In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.

#### Variable description:

- 1 age (numeric)
- 2 job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 
- 3 marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed)
- 4 education (categorical: "unknown","secondary","primary","tertiary")
- 5 default: has credit in default? (binary: "yes","no")
- 6 balance: average yearly balance, in euros (numeric) 
- 7 housing: has housing loan? (binary: "yes","no")
- 8 loan: has personal loan? (binary: "yes","no")
   
#### related with the last contact of the current campaign:
- 9 contact: contact communication type (categorical: "unknown","telephone","cellular") 
- 10 day: last contact day of the month (numeric)
- 11 month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
- 12 duration: last contact duration, in seconds (numeric)

#### other attributes:
- 13 campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
- 14 pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
- 15 previous: number of contacts performed before this campaign and for this client (numeric)
- 16 poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")

Output variable (desired target):
- 17 y - has the client subscribed a term deposit? (binary: "yes","no")

### Read the dataset and answer the following questions.

In [6]:
import pandas as pd

In [19]:
df = pd.read_csv('bank-marketing.csv')
df

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no
1,33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no
2,35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no
3,30,management,married,tertiary,no,1476,yes,yes,unknown,3,jun,199,4,-1,0,unknown,no
4,59,blue-collar,married,secondary,no,0,yes,no,unknown,5,may,226,1,-1,0,unknown,no
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4516,33,services,married,secondary,no,-333,yes,no,cellular,30,jul,329,5,-1,0,unknown,no
4517,57,self-employed,married,tertiary,yes,-3313,yes,yes,unknown,9,may,153,1,-1,0,unknown,no
4518,57,technician,married,secondary,no,295,no,no,cellular,19,aug,151,11,-1,0,unknown,no
4519,28,blue-collar,married,secondary,no,1137,no,no,cellular,6,feb,129,4,211,3,other,no


### Question

Extract all column names. Count the number of columns (using code).

In [18]:
print (df.columns)

len(df.columns)

Index(['age', 'job', 'marital', 'education', 'default', 'balance', 'housing',
       'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays',
       'previous', 'poutcome', 'y'],
      dtype='object')


17

### Question

Is data in the correct format? By that we mean, do you see integers and floats where you expect them to be the case? If not, convert them into the correct format. Also make sure that all entries are non-null.

In [63]:
#df.dtypes 

n_null = []
#n_null = df.isna
#print (df.isna)
n_null = df.notna()
#print (df.notna())
print (n_null)
for item in n_null:
    if item != False:
     print(item)

       age   job  marital  education  default  balance  housing  loan  \
0     True  True     True       True     True     True     True  True   
1     True  True     True       True     True     True     True  True   
2     True  True     True       True     True     True     True  True   
3     True  True     True       True     True     True     True  True   
4     True  True     True       True     True     True     True  True   
...    ...   ...      ...        ...      ...      ...      ...   ...   
4516  True  True     True       True     True     True     True  True   
4517  True  True     True       True     True     True     True  True   
4518  True  True     True       True     True     True     True  True   
4519  True  True     True       True     True     True     True  True   
4520  True  True     True       True     True     True     True  True   

      contact   day  month  duration  campaign  pdays  previous  poutcome  \
0        True  True   True      True      True

### Question

Provide a general summary statistics of the entire dataset. The describe() method is what you would want to use.

In [56]:
df.describe()


Unnamed: 0,age,balance,day,duration,campaign,pdays,previous
count,4521.0,4521.0,4521.0,4521.0,4521.0,4521.0,4521.0
mean,41.170095,1422.657819,15.915284,263.961292,2.79363,39.766645,0.542579
std,10.576211,3009.638142,8.247667,259.856633,3.109807,100.121124,1.693562
min,19.0,-3313.0,1.0,4.0,1.0,-1.0,0.0
25%,33.0,69.0,9.0,104.0,1.0,-1.0,0.0
50%,39.0,444.0,16.0,185.0,2.0,-1.0,0.0
75%,49.0,1480.0,21.0,329.0,3.0,-1.0,0.0
max,87.0,71188.0,31.0,3025.0,50.0,871.0,25.0


### Question

The data type of columns like job, married, education etc. is called categorical data. List all the different categories in the job column.

In [62]:
#df
#df.describe(include='all')

df['job'].value_counts()

management       969
blue-collar      946
technician       768
admin            478
services         417
retired          230
self-employed    183
entrepreneur     168
unemployed       128
housemaid        112
student           84
unknown           38
Name: job, dtype: int64

### Question

In one line of code, provide a count of the number of people who were unemployed and owned a home.

In [119]:

len(df.loc[(df['job'] == 'unemployed') & (df['housing'] == 'yes')])


58

### Question

What is the education level of a typical blue-collar worker? Explore value_counts() method.

In [110]:
df.loc[(df['job'] == 'blue-collar') , 'education'].value_counts()



secondary    524
primary      369
unknown       41
tertiary      12
Name: education, dtype: int64

### Question

How many who are unemployed have an outstanding loan? Is that percentage more than that of the employed ones? 

In [138]:
print("Total Records: ",  len(df))
print("\n Total  unemployed have an outstanding loan  \n")
print(len(df[(df['job'] == 'unemployed') & (df['loan'] == 'yes')]))

print(" \n % of  unemployed have an outstanding loan ")
unemp = (len(df[(df['job'] == 'unemployed') & (df['loan'] == 'yes')]) / len(df)) *100
print(unemp)


print("  \nTOTAL  EMPLOYED have an outstanding loan \n")


print(len(df[(df['job'] != 'unemployed') & (df['loan'] == 'yes')]))


print(" \n% of  EMPLOYED have an outstanding loan \n")

emp= (len(df[(df['job'] != 'unemployed') & (df['loan'] == 'yes')]) / len(df)) *100
print(emp)

print(" \n How many who are unemployed have an outstanding loan? Is that percentage more than that of the employed ones? :  ")
print(unemp>emp)


Total Records:  4521

 Total  unemployed have an outstanding loan  

13
 
 % of  unemployed have an outstanding loan 
0.28754700287547
  
TOTAL  EMPLOYED have an outstanding loan 

678
 
% of  EMPLOYED have an outstanding loan 

14.996682149966823
 
 How many who are unemployed have an outstanding loan? Is that percentage more than that of the employed ones? :  
False


### Question

What percent of clients subscribed to the term deposit (column y)? 

In [142]:
print("% of client  subscribed to the term deposit (column y)? ")
len(df[(df['y'] == 'yes')])/len(df)*100


% of client  subscribed to the term deposit (column y)? 


11.523999115239992

### Question

What percent of married clients subscribed to the term deposit? Is that more or less than that for single folks? 

In [151]:
print("Total Records: ",  len(df))
print("\n Total  MARRIED  have term deposit  \n")
print(len(df[(df['marital'] == 'married') & (df['y'] == 'yes')]))

print(" \n % of   MARRIED  have term deposit ")
unemp = (len(df[(df['marital'] == 'married') & (df['y'] == 'yes')]) / len(df)) *100
print(unemp)


print("  \nTOTAL SINGLE  have term deposit\n")


print(len(df[(df['marital'] != 'married') & (df['y'] == 'yes')]))


print(" \n%  of   SINGLE  have term deposit \n")

emp= (len(df[(df['marital'] != 'married') & (df['y'] == 'yes')]) / len(df)) *100
print(emp)

print(" \n What percent of married clients subscribed to the term deposit? Is that more or less than that for single folks? :  ")
if unemp>emp :
    print ("More MARRIED people : {} % has TERM DEPOSIT than Single Forks: {} ".format(unemp, emp))


Total Records:  4521

 Total  MARRIED  have term deposit  

277
 
 % of   MARRIED  have term deposit 
6.126963061269631
  
TOTAL SINGLE  have term deposit

244
 
%  of   SINGLE  have term deposit 

5.397036053970361
 
 What percent of married clients subscribed to the term deposit? Is that more or less than that for single folks? :  
more MARRIED people : 6.126963061269631 % has TERM DEPOSIT than Single Forks: 5.397036053970361 


### Question

<p>Ask an interesting question of this data set and provide a solution to answer that.</p>
<p>The goal is to help teach your fellow students all possible questions we can collectively ask of this dataset. Your question should be as clear as possible and as short as possible. Try to avoid asking questions that are too trivial or obvious.</p>

<p>This is a bonus point question. The only way not to get full points are if you do the following:
<ul>
<li>You do not perform this task.
<li>You do not include a solution. 
</ul>
</p>
<p>If the solution you provide is incorrect, you will still receive full points. But you must make an honest effort to get it right.
</p>

Write your question here.

What percent of Person with Stable life (with higher stability parameters) : high pay job such as management , Highly educated(tertiary) and married have house
v/s
Person with less stable life ( with lower stability parameters) : Low pay  Jobs such as  blue-collar , Low education : primary , Unmarried  have house

In [180]:
#Write your solution here.
print("Total Records: ",  len(df))
print("\n Total Person in Stable Life have house   \n")
print(len(df[(df['job'] == 'management') & (df['marital'] == 'married') & (df['education'] == 'tertiary')& (df['housing'] == 'yes')]) / len(df)*100)


print(" \n % of  Person in Stable Life have house ")
unemp = (len(df[(df['job'] == 'management') & (df['marital'] == 'married') & (df['education'] == 'tertiary')& (df['housing'] == 'yes') ]) / len(df))*100
print(unemp)


print("  \nTOTAL  Person in less Stable Life- blue_collar job, not married, primary education have house\n")


print(len(df[(df['job'] == 'blue-collar') & (df['marital'] != 'married') & (df['education'] == 'primary')& (df['housing'] == 'yes') ]) / len(df) *100)



print(" \n% of  Person in less Stable Life- blue_collar job, not married, primary education have house\n")

emp = (len(df[(df['job'] == 'blue-collar') & (df['marital'] != 'married') & (df['education'] == 'primary')& (df['housing'] == 'yes') ]) / len(df)) *100
print(emp)

print(" \n  Is the  percentage who own house more for  person in stable life  than that of person not in stable life? :  ")
print(unemp>emp)


Total Records:  4521

 Total Person in Stable Life have house   

4.97677504976775
 
 % of  Person in Stable Life have house 
4.97677504976775
  
TOTAL  Person in less Stable Life- blue_collar job, not married, primary education have house

1.3492590134925901
 
% of  Person in less Stable Life- blue_collar job, not married, primary education have house

1.3492590134925901
 
  Is the  percentage who own house more for  person in stable life  than that of person not in stable life? :  
True
