# Pandas Overview
When finished with this notebook, we'll be ready for anything pandas.


In [1]:
import pandas as pd


## Creating, Reading and Writing
Pandas has two core objects, **DataFrame** and **Series**.

### DateFrame
A DataFrame is a table. It contains an array of individual *entries*, each of which has a certain *value*. Each entry corresponds to a row (or *record*) and a *column*.

For example, consider the following simple DataFrame:

In [2]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


Here's another example showing strings.

In [3]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


We are using the ```pd.DataFrame()``` constructor to generate these DataFrame objects. The syntax for declaring a new one is a dictionary whose keys are the column names (*Bob* and *Sue* in this example), and whose values are a list of entries. This is the standard way of constructing a new DataFrame, and the one you are most likely to encounter.


The dictionary-list constructor assigns values to the *column labels*, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the *row labels*. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.

The list of row labels used in a DataFrame is known as an **Index**. We can assign values to it by using an ```index``` parameter in our constructor:

In [8]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'],
              'Sue': ['Pretty good.', 'Bland.']},
              index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


### Series
A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. 

In [9]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an ```index``` parameter. However, a Series does not have a column name, it only has one overall ```name```:

In [10]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together". We'll see more of this below.

### Reading data files
Being able to create a DataFrame or Series by hand is handy. But, most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file. When you open a CSV file you get something that looks like this:
```
Product A, Product B, Product C
30,21,9
35,34,1
41,11,11
```
Let's now set aside our toy datasets and read in a real dataset into a DataFrame. We'll use ```pd.read_csv()``` to do this.

In [11]:
ign_scores = pd.read_csv("./datasets/data-vis/ign_scores.csv")

We can use the ```shape``` attribute to check how large a DataFrame is, and the ```head()``` function to peek the first five rows.

In [14]:
print(ign_scores.shape)
ign_scores.head()

(21, 13)


Unnamed: 0,Platform,Action,"Action, Adventure",Adventure,Fighting,Platformer,Puzzle,RPG,Racing,Shooter,Simulation,Sports,Strategy
0,Dreamcast,6.882857,7.511111,6.281818,8.2,8.34,8.088889,7.7,7.0425,7.616667,7.628571,7.272222,6.433333
1,Game Boy Advance,6.373077,7.507692,6.057143,6.226316,6.970588,6.532143,7.542857,6.657143,6.444444,6.928571,6.694444,7.175
2,Game Boy Color,6.272727,8.166667,5.307692,4.5,6.352941,6.583333,7.285714,5.897436,4.5,5.9,5.790698,7.4
3,GameCube,6.532584,7.608333,6.753846,7.422222,6.665714,6.133333,7.890909,6.852632,6.981818,8.028571,7.481319,7.116667
4,Nintendo 3DS,6.670833,7.481818,7.414286,6.614286,7.503448,8.0,7.719231,6.9,7.033333,7.7,6.388889,7.9


The ```pd.read_csv()``` function is well-endowed, with over 30 optional parameters you can specify, like being able to specify a specific index column using ```index_col.```

## Indexing, Selecting, & Assigning


## Summary Functions and Maps


## Grouping and Sorting


## Data Types and Missing Values


## Renaming and Combining
