# Creating Pandas Data Frames

In [2]:
import pandas as pd

In [4]:
# Create a dictionary of pandas series

items = {
    'Bob':pd.Series(data=[245,65,77], index=['bike','pants','watch']),
    'Alice': pd.Series(data=[40,100,200,55], index=['bike','glasses','book','pants'])
}
print(type(items))

<class 'dict'>


In [6]:
# Create a pandas dataframe by passing it a dictionary of pandas series

shopping_carts = pd.DataFrame(items)
shopping_carts

Unnamed: 0,Bob,Alice
bike,245.0,40.0
book,,200.0
glasses,,100.0
pants,65.0,55.0
watch,77.0,


There are several things to notice here that are worth pointing out. We see that DataFrames are displayed in tabular form, much like an Excel spreadsheet, with the labels of rows and columns in bold. Also notice that the row labels of the DataFrame are built from the union of the index labels of the two Pandas Series we used to construct the dictionary. And the column labels of the DataFrame are taken from the keys of the dictionary. Another thing to notice is that the columns are arranged alphabetically and not in the order given in the dictionary. We will see later that this won't happen when we load data into a DataFrame from a data file. The last thing we want to point out is that we see some NaN values appear in the DataFrame. NaN stands for Not a Number, and is Pandas way of indicating that it doesn't have a value for that particular row and column index. For example, if we look at the column of Alice, we see that it has NaN in the watch index. You can see why this is the case by looking at the dictionary we created at the beginning. We clearly see that the dictionary has no item for Alice labeled watches. So whenever a DataFrame is created, if a particular column doesn't have values for a particular row index, Pandas will put a NaN value there. If we were to feed this data into a machine learning algorithm we will have to remove these NaN values first. In a later lesson we will learn how to deal with NaN values and clean our data. For now, we will leave these values in our DataFrame.

In the above example we created a Pandas DataFrame from a dictionary of Pandas Series that had clearly defined indexes. If we don't provide index labels to the Pandas Series, Pandas will use numerical row indexes when it creates the DataFrame. 

In [8]:
data = {
        'Bob':pd.Series(data=[245,65,77]),
    'Alice': pd.Series(data=[40,100,200,55])
}

df = pd.DataFrame(data)

df

Unnamed: 0,Bob,Alice
0,245.0,40
1,65.0,100
2,77.0,200
3,,55
