# DataFrame

Rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean)

DataFrame has both a column and row index; it can be thought of as a dict of series all sharing the same index. 

Many ways to construct a dataframe; the most common include creating a dictionary of equal-length lists or NumPy arrays

In [1]:
import pandas as pd
import numpy as np

In [2]:
dict1 = {'County': ['Dublin', 'Meath', 'Kerry', 'Cork'],
         'Year': [2001, 2002, 2003, 2004],
           'Pop': [1.5, 1.6,0.9, 1.6]}

df = pd.DataFrame(dict1)
df

Unnamed: 0,County,Year,Pop
0,Dublin,2001,1.5
1,Meath,2002,1.6
2,Kerry,2003,0.9
3,Cork,2004,1.6


Index is assigned automatically and in the order entered as with Series.

We can pass the sequence of columns in the order we want: 

In [6]:
df = pd.DataFrame(dict1, columns=['Year', 'County', 'Pop'])
df

Unnamed: 0,Year,County,Pop
0,2001,Dublin,1.5
1,2002,Meath,1.6
2,2003,Kerry,0.9
3,2004,Cork,1.6


##### If I passed a column that was not present in the original dict, it would appear as NaN

.head() method selects only the first 5 rows.

.values shows the values in each row.

frame.name.index, frame.name.column —> names the index and column

Retrieve a column in a DataFrame by indexing:
- df[‘County’]

Rows can be retrieved by position or name with the special .loc attribute.

In [7]:
df.loc[1]

Year       2002
County    Meath
Pop         1.6
Name: 1, dtype: object

In [8]:
#Columns can be modified
df = pd.DataFrame(dict1, columns=['Year', 'County', 'Pop', 'Debt'])
df

Unnamed: 0,Year,County,Pop,Debt
0,2001,Dublin,1.5,
1,2002,Meath,1.6,
2,2003,Kerry,0.9,
3,2004,Cork,1.6,


In [11]:
df['Debt'] = [30, 24, 55, 43]
df

Unnamed: 0,Year,County,Pop,Debt
0,2001,Dublin,1.5,30
1,2002,Meath,1.6,24
2,2003,Kerry,0.9,55
3,2004,Cork,1.6,43


In [13]:
df['Debt'] = np.arange(4)
df

Unnamed: 0,Year,County,Pop,Debt
0,2001,Dublin,1.5,0
1,2002,Meath,1.6,1
2,2003,Kerry,0.9,2
3,2004,Cork,1.6,3


Here we create a series and assign the rows to debt. Note the index in the series and that is aligns with the index in the DF.

In [15]:
val = pd.Series([1,2,3], index=[2,1,3])
df['Debt'] = val
df

Unnamed: 0,Year,County,Pop,Debt
0,2001,Dublin,1.5,
1,2002,Meath,1.6,2.0
2,2003,Kerry,0.9,1.0
3,2004,Cork,1.6,3.0


Use the del keyword to delete a column from a DF
- del df[‘debt’]

Use .copy() method to copy
x = df['County'].copy()

#### Pass a Nested Dict
If you pass a nested dictionary, the inner keys will be interpreted as the index while the outer keys will be as columns

In [18]:
dict2 = {'Dublin': {'2001':1,'2002':3},
         'Meath':{'2001':1,'2002':3}}
df2 = pd.DataFrame(dict2)
df2

Unnamed: 0,Dublin,Meath
2001,1,1
2002,3,3


#### Can transpose the DF in the same way as NumPy:


In [19]:
df.T

Unnamed: 0,0,1,2,3
Year,2001,2002,2003,2004
County,Dublin,Meath,Kerry,Cork
Pop,1.5,1.6,0.9,1.6
Debt,,2,1,3
