**Some Pandas**

We use the pandas package to deal with flat/rectangular datasets.
Pandas calls these data frames (a term that comes from R).
We'll only introduce a few things one can do with these for now.

- read a .csv file as a pandas data frame
- apply a function to every row to produce a pandas *Series*
- create a new data frame by selecting rows satisfying some criterion


In [1]:
import pandas as pd
df=pd.read_csv("mortgage_data.csv")
print(df.shape)
df.head(10)

(9864, 5)


Unnamed: 0,location,princ,irate,cscore,result
0,suburban,358,7.0,728,default
1,suburban,637,7.25,675,default
2,suburban,303,7.25,645,non-default
3,suburban,397,7.25,609,non-default
4,suburban,420,7.75,669,default
5,suburban,574,7.25,715,default
6,suburban,744,6.75,716,non-default
7,suburban,390,7.25,681,non-default
8,urban,491,7.5,518,default
9,suburban,451,7.5,641,default


In [2]:
df.iloc[5]

location    suburban
princ            574
irate           7.25
cscore           715
result       default
Name: 5, dtype: object

In [3]:
def f(row):
    return(row.irate>7.2)
df.apply(f,axis=1)

0       False
1        True
2        True
3        True
4        True
        ...  
9859     True
9860     True
9861    False
9862    False
9863     True
Length: 9864, dtype: bool

In [4]:
def f(row):
    return(row.location=="urban")
df.apply(f,axis=1)

0       False
1       False
2       False
3       False
4       False
        ...  
9859    False
9860    False
9861    False
9862    False
9863    False
Length: 9864, dtype: bool

In [5]:
def f(row):
    return(row.location in ["urban","suburban"])
df.apply(f,axis=1)

0       True
1       True
2       True
3       True
4       True
        ... 
9859    True
9860    True
9861    True
9862    True
9863    True
Length: 9864, dtype: bool

In [7]:
def f(row):
    return(row.cscore>750)
res=df.apply(f,axis=1) # True/False Series
print(res)
dfnew1=df.loc[res] # rows of df that give res==True
dfnew2=df.loc[~res] # rows of df that give res=False
print(dfnew1.shape)
print(dfnew2.shape)



0       False
1       False
2       False
3       False
4       False
        ...  
9859    False
9860    False
9861    False
9862     True
9863    False
Length: 9864, dtype: bool
(617, 5)
(9247, 5)
