# What is a DataFrame?
Before we get into manipulating data we first need to understand how the container for your data, the DataFrame, works!

In [1]:
import pandas as pd # import pandas module

## Many Series make a DataFrame!
Before we start creating or loading things into a DataFrame lets look at the what a DataFrame is made up of. Series are essentially columns of data, the term Series seems awful academic but if you're going to find help online you're going to need to use the correct terminology. From now on when you go to say column, pause and remember to say series.

In [2]:
#create an object
some_numbers = pd.Series([22, 50, 59, 80, 100]) # this is how you create one Series of data/a column of data

In [3]:
some_numbers # display the Series that you created

0     22
1     50
2     59
3     80
4    100
dtype: int64

In [4]:
some_numbers.values # display all the values in the Series

array([ 22,  50,  59,  80, 100])

In [5]:
some_numbers.index # display the Series index

RangeIndex(start=0, stop=5, step=1)

In [6]:
#lets create a Series with an index
all_the_fruit = pd.Series([14, 19, 10, 39, 393], index=['Orange', 'Banana', 'Grape', 'Blueberry', 'Dragon Fruit'])

In [7]:
all_the_fruit # display the Series that you created

Orange           14
Banana           19
Grape            10
Blueberry        39
Dragon Fruit    393
dtype: int64

In [8]:
all_the_fruit.values # display all the values in the Series

array([ 14,  19,  10,  39, 393])

In [9]:
all_the_fruit.index # display the Series index

Index(['Orange', 'Banana', 'Grape', 'Blueberry', 'Dragon Fruit'], dtype='object')

In [11]:
all_the_fruit['Orange'] #select a row by it's index

14

How can we add to or change a Series?

In [12]:
all_the_fruit['Apple'] = 278 # adding a new row to your Series

In [13]:
all_the_fruit #Display the new addtion to the Series

Orange           14
Banana           19
Grape            10
Blueberry        39
Dragon Fruit    393
Apple           278
dtype: int64

In [18]:
all_the_fruit['Apple'] = 2378 # change the value

In [19]:
all_the_fruit #Display the new addtion to the Series

Orange            14
Banana            19
Grape             10
Blueberry         39
Dragon Fruit     393
Apple           2378
dtype: object

## Lets Get on to the DataFrame!
Now we've got the basics of dealing with a single Series we can begin with multiple Series i.e. a DataFrame

In [49]:
dict_veggie_suppliers = {'City Veg': [14, 19, 10, 39, 393], 'Other Fruit Company': [83, 92, 94, 202, 101]}

In [50]:
veggie_suppliers = pd.DataFrame(dict_veggie_suppliers, index=['Orange', 'Banana', 'Grape', 'Blueberry', 'Dragon Fruit'])

In [51]:
veggie_suppliers

Unnamed: 0,City Veg,Other Fruit Company
Orange,14,83
Banana,19,92
Grape,10,94
Blueberry,39,202
Dragon Fruit,393,101


In [52]:
veggie_suppliers.values

array([[ 14,  83],
       [ 19,  92],
       [ 10,  94],
       [ 39, 202],
       [393, 101]])

In [53]:
veggie_suppliers.columns

Index(['City Veg', 'Other Fruit Company'], dtype='object')

In [54]:
list(veggie_suppliers)

['City Veg', 'Other Fruit Company']

In [55]:
veggie_suppliers['City Veg']

Orange           14
Banana           19
Grape            10
Blueberry        39
Dragon Fruit    393
Name: City Veg, dtype: int64

In [56]:
veggie_suppliers.loc['Orange']

City Veg               14
Other Fruit Company    83
Name: Orange, dtype: int64

In [57]:
del veggie_suppliers['Other Fruit Company']

In [58]:
veggie_suppliers

Unnamed: 0,City Veg
Orange,14
Banana,19
Grape,10
Blueberry,39
Dragon Fruit,393


In [59]:
veggie_suppliers['New Veg Supplier'] = [832, 202, 39, 23, 22]

In [60]:
veggie_suppliers

Unnamed: 0,City Veg,New Veg Supplier
Orange,14,832
Banana,19,202
Grape,10,39
Blueberry,39,23
Dragon Fruit,393,22


## What's an Index and Why's it Important
Index's are the key which holds all the data together. When you sort it's the element which stops all you're data getting mixed up.

In [61]:
index = veggie_suppliers.index

In [62]:
index

Index(['Orange', 'Banana', 'Grape', 'Blueberry', 'Dragon Fruit'], dtype='object')

In [63]:
index[2:3]

Index(['Grape'], dtype='object')

In [64]:
index_labels = pd.Index([1,2,3,4,5])

In [65]:
veggie_suppliers.set_index(index_labels, inplace=True)

In [66]:
veggie_suppliers

Unnamed: 0,City Veg,New Veg Supplier
1,14,832
2,19,202
3,10,39
4,39,23
5,393,22


In [67]:
veggie_suppliers.set_index([['Orange', 'Banana', 'Grape', 'Blueberry', 'Dragon Fruit']], inplace=True)

## Asking the dataframe some questions about itself

In [68]:
'Banana' in veggie_suppliers.columns

False

In [69]:
'New Veg Supplier' in veggie_suppliers.columns

True

In [70]:
'Banana' in veggie_suppliers.index

True

In [71]:
'New Veg Supplier' in veggie_suppliers.index

False

## Re-indexing data

In [72]:
more_index_vals = pd.Index(['Orange', 'Banana', 'Grape', 'Blueberry', 'Dragon Fruit', 'Apples', 'Ginger'])

In [73]:
veggie_suppliers = veggie_suppliers.reindex(more_index_vals, fill_value=0)

In [74]:
veggie_suppliers

Unnamed: 0,City Veg,New Veg Supplier
Orange,14,832
Banana,19,202
Grape,10,39
Blueberry,39,23
Dragon Fruit,393,22
Apples,0,0
Ginger,0,0


In [75]:
new_stockists = pd.Index(['City Veg', 'New Veg Supplier', 'The Cheap Fruit Company', 'The Expensive Fruit Company'])

In [76]:
veggie_suppliers = veggie_suppliers.reindex(columns=new_stockists, fill_value=0)

In [77]:
veggie_suppliers

Unnamed: 0,City Veg,New Veg Supplier,The Cheap Fruit Company,The Expensive Fruit Company
Orange,14,832,0,0
Banana,19,202,0,0
Grape,10,39,0,0
Blueberry,39,23,0,0
Dragon Fruit,393,22,0,0
Apples,0,0,0,0
Ginger,0,0,0,0
