# Data Analysis with Pandas (1st Part)

**Outline:**

* [Intro to Pandas](#Intro-to-Pandas)
* [Pandas Data Structures](#Pandas-Data-Structures)
  * [Python List](#Python-List)
  * [Series](#Series)
  * [DataFrame](#DataFrame)
* [Pandas Data Types](#Pandas-Data-Types)
* [Knowing Basic Stats](#Knowing-Basic-Stats)
* [Dealing with Files](#Dealing-with-Files)
  * [Reading Data from File](#Reading-Data-from-File)
  * [Writing Data to File](#Writing-Data-to-File)
* [Dealing with Columns](#Dealing-with-Columns)
  * [Renaming Columns](#Renaming-Columns)
  * [Adding New Columns](#Adding-New-Columns)
  * [Removing Existing Columns](#Removing-Existing-Columns)

## Intro to Pandas

In [1]:
from IPython.core.display import HTML
HTML("<iframe src=http://pandas.pydata.org width=800 height=350></iframe>")

In [2]:
import pandas as pd

## Pandas Data Structures

### Python List

In [3]:
data = [113, 1463, 95, 33]
data[2]

95

### Series

In [4]:
pd.Series()

Series([], dtype: float64)

In [5]:
series_data = pd.Series([113, 1463, 95, 33])
series_data

0     113
1    1463
2      95
3      33
dtype: int64

In [6]:
type(series_data)

pandas.core.series.Series

In [7]:
series_data[2]

95

In [8]:
series_data = pd.Series({'a': 113, 'b': 1463, 'c': 95, 'd': 33})
series_data

a     113
b    1463
c      95
d      33
dtype: int64

In [9]:
series_data[1]

1463

In [10]:
series_data['b']

1463

In [11]:
series_data = pd.Series({'a': 113, 'b': 1463, 'c': 95, 'd': 33}, index=['b', 'c', 'd', 'e', 'f'])
series_data

b    1463.0
c      95.0
d      33.0
e       NaN
f       NaN
dtype: float64

In [12]:
series_data.isnull()

b    False
c    False
d    False
e     True
f     True
dtype: bool

In [13]:
series_data.isnull().sum()

2

In [14]:
series_data.index

Index(['b', 'c', 'd', 'e', 'f'], dtype='object')

In [15]:
series_data.values

array([ 1463.,    95.,    33.,    nan,    nan])

In [16]:
[1, 2, 3] + [3, 4, 6]

[1, 2, 3, 3, 4, 6]

In [17]:
series_data + series_data

b    2926.0
c     190.0
d      66.0
e       NaN
f       NaN
dtype: float64

In [18]:
series_data.append(pd.Series([113, 1463, 95, 33]))

b    1463.0
c      95.0
d      33.0
e       NaN
f       NaN
0     113.0
1    1463.0
2      95.0
3      33.0
dtype: float64

In [19]:
series_data = series_data.append(pd.Series({'b': 99}))

In [20]:
series_data.index

Index(['b', 'c', 'd', 'e', 'f', 'b'], dtype='object')

In [21]:
series_data

b    1463.0
c      95.0
d      33.0
e       NaN
f       NaN
b      99.0
dtype: float64

In [22]:
series_data[5]

99.0

In [23]:
series_data['b']

b    1463.0
b      99.0
dtype: float64

### DataFrame

In [24]:
personal_data_dict = {
    'age': [39, 50, 38],
    'education': ['Bachelors', 'Bachelors', 'HS-grad'],
    'occupation': ['Adm-clerical', 'Tech-support', 'Sales'],
    'sex': ['Male', 'Female', 'Female'],
    'capital-gain': [2174, 111, 993]
}
df = pd.DataFrame(personal_data_dict)

In [25]:
df

Unnamed: 0,age,capital-gain,education,occupation,sex
0,39,2174,Bachelors,Adm-clerical,Male
1,50,111,Bachelors,Tech-support,Female
2,38,993,HS-grad,Sales,Female


In [26]:
type(df)

pandas.core.frame.DataFrame

In [27]:
df.shape

(3, 5)

In [28]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [29]:
df.values

array([[39, 2174, 'Bachelors', 'Adm-clerical', 'Male'],
       [50, 111, 'Bachelors', 'Tech-support', 'Female'],
       [38, 993, 'HS-grad', 'Sales', 'Female']], dtype=object)

In [30]:
df.columns

Index(['age', 'capital-gain', 'education', 'occupation', 'sex'], dtype='object')

In [31]:
df.head(2)

Unnamed: 0,age,capital-gain,education,occupation,sex
0,39,2174,Bachelors,Adm-clerical,Male
1,50,111,Bachelors,Tech-support,Female


In [32]:
df.tail()

Unnamed: 0,age,capital-gain,education,occupation,sex
0,39,2174,Bachelors,Adm-clerical,Male
1,50,111,Bachelors,Tech-support,Female
2,38,993,HS-grad,Sales,Female


In [33]:
df['occupation']

0    Adm-clerical
1    Tech-support
2           Sales
Name: occupation, dtype: object

In [34]:
df['age'][1]

50

In [35]:
df['capital-gain']

0    2174
1     111
2     993
Name: capital-gain, dtype: int64

In [36]:
df['name']

KeyError: 'name'

In [37]:
df['age']

0    39
1    50
2    38
Name: age, dtype: int64

In [39]:
df.age.value_counts()

39    1
50    1
38    1
Name: age, dtype: int64

In [None]:
df.age.value_counts()

In [40]:
type(df.age)

pandas.core.series.Series

## Pandas Data Types

In [41]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
age             3 non-null int64
capital-gain    3 non-null int64
education       3 non-null object
occupation      3 non-null object
sex             3 non-null object
dtypes: int64(2), object(3)
memory usage: 200.0+ bytes


## Knowing Basic Stats

In [42]:
df.describe()

Unnamed: 0,age,capital-gain
count,3.0,3.0
mean,42.333333,1092.666667
std,6.658328,1035.104987
min,38.0,111.0
25%,38.5,552.0
50%,39.0,993.0
75%,44.5,1583.5
max,50.0,2174.0


In [43]:
df.cov()

Unnamed: 0,age,capital-gain
age,44.333333,-5349.333
capital-gain,-5349.333333,1071442.0


In [44]:
df.corr()

Unnamed: 0,age,capital-gain
age,1.0,-0.776158
capital-gain,-0.776158,1.0


## Dealing with Files

### Reading Data from File

#### CSV File

UCI Machine Learning Repository: [Adult Data Set](https://archive.ics.uci.edu/ml/datasets/Adult)

In [45]:
adult = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data')

In [46]:
adult.head()

Unnamed: 0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
0,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
1,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
2,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
3,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
4,37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K


In [47]:
adult = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', header=None)

In [48]:
adult.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [49]:
columns = ['age', 'Work Class', 'fnlwgt', 'education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'Money Per Year']
adult.columns = columns

In [50]:
adult.head(2)

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K


In [52]:
adult['age'][0:3]

0    39
1    50
2    38
Name: age, dtype: int64

In [None]:
columns = ['age', 'Work Class', 'fnlwgt', 'education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'Money Per Year']
adult = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', names=columns)

In [None]:
adult.head()

In [None]:
adult['age']

In [56]:
adult.age.value_counts(ascending=True)[0:5]

86    1
87    1
88    3
85    3
83    6
Name: age, dtype: int64

In [58]:
adult[adult.age == adult.age.value_counts().index[0]]['sex'].value_counts()

 Male      611
 Female    287
Name: sex, dtype: int64

#### JSON File

In [59]:
!cat try_series.json

{
    "name": "Kan Ouivirach",
    "email": "kan@prontomarketing.com"
}


In [60]:
series_data = pd.read_json('try_series.json', typ='series')

In [61]:
series_data

email    kan@prontomarketing.com
name               Kan Ouivirach
dtype: object

In [62]:
!cat try_df.json

[
    {
        "name": "Kan Ouivirach",
        "email": "kan@prontomarketing.com"
    },
    {
        "name": "Some Data Scientist",
        "email": "someone@datascience.th"
    }
]


In [63]:
df = pd.read_json('try_df.json')

In [64]:
df

Unnamed: 0,email,name
0,kan@prontomarketing.com,Kan Ouivirach
1,someone@datascience.th,Some Data Scientist


### Writing Data to File

In [65]:
adult.to_json('adult.json')

In [66]:
!ls

adult.csv                       reviews_Digital_Music_5.csv
adult.json                      reviews_Digital_Music_5.json.gz
exercise2.csv                   try_df.json
pandas-01.ipynb                 try_series.json
pandas-02.ipynb


In [67]:
adult = pd.read_json('adult.json')

In [68]:
adult.head(3)

Unnamed: 0,Money Per Year,Work Class,age,capital-gain,capital-loss,education,education-num,fnlwgt,hours-per-week,marital-status,native-country,occupation,race,relationship,sex
0,<=50K,State-gov,39,2174,0,Bachelors,13,77516,40,Never-married,United-States,Adm-clerical,White,Not-in-family,Male
1,<=50K,Self-emp-not-inc,50,0,0,Bachelors,13,83311,13,Married-civ-spouse,United-States,Exec-managerial,White,Husband,Male
10,>50K,Private,37,0,0,Some-college,10,280464,80,Married-civ-spouse,United-States,Exec-managerial,Black,Husband,Male


In [None]:
adult.tto_csv('adult.csv')

In [None]:
!ls

## Dealing with Columns

### Renaming Columns

In [69]:
adult = pd.read_csv('adult.csv', index_col=0)

In [70]:
adult.head()

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [71]:
adult_new = adult.rename(columns={'Work Class': 'workclass'})

In [72]:
adult_new.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [73]:
adult_new.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education-num',
       'marital-status', 'occupation', 'relationship', 'race', 'sex',
       'capital-gain', 'capital-loss', 'hours-per-week', 'native-country',
       'Money Per Year'],
      dtype='object')

In [74]:
adult_new.columns = adult_new.columns.str.lower().str.replace(' ', '-')

In [75]:
adult_new.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education-num',
       'marital-status', 'occupation', 'relationship', 'race', 'sex',
       'capital-gain', 'capital-loss', 'hours-per-week', 'native-country',
       'money-per-year'],
      dtype='object')

In [76]:
adult_new.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 32561 entries, 0 to 32560
Data columns (total 15 columns):
age               32561 non-null int64
workclass         32561 non-null object
fnlwgt            32561 non-null int64
education         32561 non-null object
education-num     32561 non-null int64
marital-status    32561 non-null object
occupation        32561 non-null object
relationship      32561 non-null object
race              32561 non-null object
sex               32561 non-null object
capital-gain      32561 non-null int64
capital-loss      32561 non-null int64
hours-per-week    32561 non-null int64
native-country    32561 non-null object
money-per-year    32561 non-null object
dtypes: int64(6), object(9)
memory usage: 4.0+ MB


### Adding New Columns

In [89]:
adult['normalized-age'] = (adult.age - adult.age.mean()) / adult.age.std()

In [90]:
adult.head()

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year,normalized-age
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K,0.03067
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K,0.837096
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K,-0.042641
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K,1.057031
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K,-0.775756


In [91]:
adult['normalized-age'] > 1

0        False
1        False
2        False
3         True
4        False
5        False
6        False
7        False
8        False
9        False
10       False
11       False
12       False
13       False
14       False
15       False
16       False
17       False
18       False
19       False
20       False
21        True
22       False
23       False
24        True
25        True
26       False
27        True
28       False
29       False
         ...  
32531    False
32532    False
32533     True
32534    False
32535    False
32536    False
32537    False
32538    False
32539     True
32540    False
32541    False
32542     True
32543    False
32544    False
32545    False
32546    False
32547    False
32548     True
32549    False
32550    False
32551    False
32552    False
32553    False
32554     True
32555    False
32556    False
32557    False
32558     True
32559    False
32560    False
Name: normalized-age, dtype: bool

In [92]:
adult[adult['normalized-age'] > 1]

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year,normalized-age
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K,1.057031
21,54,Private,302146,HS-grad,9,Separated,Other-service,Unmarried,Black,Female,0,0,20,United-States,<=50K,1.130342
24,59,Private,109015,HS-grad,9,Divorced,Tech-support,Unmarried,White,Female,0,0,40,United-States,<=50K,1.496899
25,56,Local-gov,216851,Bachelors,13,Married-civ-spouse,Tech-support,Husband,White,Male,0,0,40,United-States,>50K,1.276965
27,54,?,180211,Some-college,10,Married-civ-spouse,?,Husband,Asian-Pac-Islander,Male,0,0,60,South,>50K,1.130342
41,53,Self-emp-not-inc,88506,Bachelors,13,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,40,United-States,<=50K,1.057031
45,57,Federal-gov,337895,Bachelors,13,Married-civ-spouse,Prof-specialty,Husband,Black,Male,0,0,40,United-States,>50K,1.350276
46,53,Private,144361,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,38,United-States,<=50K,1.057031
67,53,Private,169846,HS-grad,9,Married-civ-spouse,Adm-clerical,Wife,White,Female,0,0,40,United-States,>50K,1.057031
74,79,Private,124744,Some-college,10,Married-civ-spouse,Prof-specialty,Other-relative,White,Male,0,0,20,United-States,<=50K,2.963128


In [93]:
adult[adult['age'] > 80]

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year,normalized-age
222,90,Private,51744,HS-grad,9,Never-married,Other-service,Not-in-family,Black,Male,0,2206,40,United-States,<=50K,3.769554
918,81,Self-emp-not-inc,136063,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,30,United-States,<=50K,3.109751
1040,90,Private,137018,HS-grad,9,Never-married,Other-service,Not-in-family,White,Female,0,0,40,United-States,<=50K,3.769554
1168,88,Self-emp-not-inc,206291,Prof-school,15,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,40,United-States,<=50K,3.622932
1935,90,Private,221832,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,<=50K,3.769554
2303,90,Private,52386,Some-college,10,Never-married,Other-service,Not-in-family,Asian-Pac-Islander,Male,0,0,35,United-States,<=50K,3.769554
2891,90,Private,171956,Some-college,10,Separated,Adm-clerical,Own-child,White,Female,0,0,40,Puerto-Rico,<=50K,3.769554
2906,81,Private,114670,9th,5,Widowed,Priv-house-serv,Not-in-family,Black,Female,2062,0,5,United-States,<=50K,3.109751
3211,82,?,29441,7th-8th,4,Widowed,?,Not-in-family,White,Male,0,0,5,United-States,<=50K,3.183063
3537,81,Self-emp-not-inc,137018,HS-grad,9,Widowed,Adm-clerical,Not-in-family,White,Female,0,0,20,United-States,<=50K,3.109751


In [94]:
adult.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 32561 entries, 0 to 32560
Data columns (total 16 columns):
age               32561 non-null int64
Work Class        32561 non-null object
fnlwgt            32561 non-null int64
education         32561 non-null object
education-num     32561 non-null int64
marital-status    32561 non-null object
occupation        32561 non-null object
relationship      32561 non-null object
race              32561 non-null object
sex               32561 non-null object
capital-gain      32561 non-null int64
capital-loss      32561 non-null int64
hours-per-week    32561 non-null int64
native-country    32561 non-null object
Money Per Year    32561 non-null object
normalized-age    32561 non-null float64
dtypes: float64(1), int64(6), object(9)
memory usage: 4.2+ MB


### Removing Existing Columns

In [95]:
adult.drop('normalized-age')

ValueError: labels ['normalized-age'] not contained in axis

We need to specify the parameter called `axis` when we drop.

In [96]:
adult.drop('normalized-age', axis=1)

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
5,37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K
6,49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
7,52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
8,31,Private,45781,Masters,14,Never-married,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,>50K
9,42,Private,159449,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,5178,0,40,United-States,>50K


In [97]:
adult.head()

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year,normalized-age
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K,0.03067
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K,0.837096
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K,-0.042641
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K,1.057031
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K,-0.775756


In [98]:
adult = adult.drop('normalized-age', axis=1)

In [99]:
adult.head()

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [101]:
adult.head()

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [100]:
adult.drop([0, 1, axis=0)

Unnamed: 0,age,Work Class,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,Money Per Year
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
5,37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K
6,49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
7,52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
8,31,Private,45781,Masters,14,Never-married,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,>50K
9,42,Private,159449,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,5178,0,40,United-States,>50K
10,37,Private,280464,Some-college,10,Married-civ-spouse,Exec-managerial,Husband,Black,Male,0,0,80,United-States,>50K
