# DataFrames

dataframes are very important to use when working with financial data, so it is
important that we get very familiar with them.

In [36]:
import numpy as np
import pandas as pd

random seed is used to make sure we get the same random numbers

In [37]:
from numpy.random import randn

In [38]:
np.random.seed(101)

here, w, x, y, and z columns are all Series that share an index. 
DataFrames are Series that share the same index.

In [39]:
df = pd.DataFrame(randn(5,4),['A','B','C','D','E'],['W','X','Y','Z'])
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


this shall return the W column. this is a series.

In [40]:
df['W']

A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

let us confirm that this is indeed a series

In [41]:
type(df['W'])

pandas.core.series.Series

You can also access the w column through dot notation. However, this is not
recommended, as the column could be confused for a df object attribute.

In [42]:
df.W

A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

if you want multiple columns, you pass in a list of columns

In [43]:
df[['W','Z']]

Unnamed: 0,W,Z
A,2.70685,0.503826
B,0.651118,0.605965
C,-2.018168,-0.589001
D,0.188695,0.955057
E,0.190794,0.683509


searching for df['new'] results in a key error. however, you can define it 
as if df['new'] already exists, by assigning a value to it. we can also 
perform arithmetic on these values while we're at it.

In [44]:
df['new'] = df['W'] + df['Y']
df

Unnamed: 0,W,X,Y,Z,new
A,2.70685,0.628133,0.907969,0.503826,3.614819
B,0.651118,-0.319318,-0.848077,0.605965,-0.196959
C,-2.018168,0.740122,0.528813,-0.589001,-1.489355
D,0.188695,-0.758872,-0.933237,0.955057,-0.744542
E,0.190794,1.978757,2.605967,0.683509,2.796762


if you want to remove a column, you could use df.drop('new') and pass in the
column. if you do this by itself, you will get an error, saying that
labels ['new'] not contained in axis. axis must be set to 1 to refer to the
columns.

In [45]:
df.drop('new',axis=1)

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


if we call df again, we will see that the original df did not get modified. 
in order to modify the original df, we need to set inplace to True. the 
reason why pandas does this is so that you dont accidentally lose information.

In [46]:
df.drop('new',axis=1,inplace=True)
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


df.drop() can also be used to drop rows as well. axis is 0 by default, but 
i will put it in for brevity.

In [47]:
df.drop('E',axis=0)

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057


when selecting rows, columns in a df, you can just insert the column name,
like this: ```df['Y']``` or this ```df['Y','Z']```. however, when selecting 
rows,