<center> <img src="https://github.ccs.neu.edu/caglar/DS3000/blob/master/img/ds3000.png?raw=true"> </center>

<center> <h1> Week 3 - Day 2 </h1> </center>

<center> <h2> Part 3: pandas DataFrames</h2></center>

## Outline
1. <a href='#1'>Definining DataFrames</a>
2. <a href='#2'>Customizing a `DataFrame`’s Indices</a>
3. <a href='#3'>Indexing DataFrames</a>
4. <a href='#4'>Slicing DataFrames</a>
5. <a href='#5'>DataFrame Calculation Methods</a>
6. <a href='#6'>Accessing a Specific `DataFrame` Cell</a>

## DataFrames
* Enhanced two-dimensional `array`
* Can have custom row and column indices
* Offers additional operations and capabilities that make them more convenient for many data-science oriented tasks
* Support missing data
* Each column in a `DataFrame` is a `Series`

<a id="1"></a>

## 1. Defining DataFrames
* Can define DataFrames from Dictionaries
* DataFrames present data in tabular format with keys as column names and values as rows

In [None]:
import pandas as pd #Remember to import pandas first

In [None]:
student_dict = {"Harry": [85, 81, 89], "Hermione": [95, 90, 100], "Ron": [80, 75, 85], "Ginny": [91, 92, 87]}

In [None]:
grades = pd.DataFrame(student_dict)

In [None]:
grades

* Pandas displays `DataFrame`s in tabular format with indices *left aligned* in the index column and the remaining columns’ values *right aligned*
* Dictionary keys become column names
* Values become row values
* By default, row indices are sequential integers.

<a id="2"></a>

## 2. Customizing a `DataFrame`’s Indices 
* Can use the **`index` attribute** to change the `DataFrame`’s indices from sequential integers to labels
* **Must provide a one-dimensional collection that has the same number of elements as there are _rows_ in the `DataFrame`**

In [None]:
grades.index = ['Potion1', 'Potion2', 'Potion3']

In [None]:
grades

* Can specify the indices when defining the DataFrame too

In [None]:
grades = pd.DataFrame(student_dict, index = ['Potion1', 'Potion2', 'Potion3'])

<a id="3"></a>

## 3. Indexing DataFrames
* DataFrames are useful for data processing and manipulation, which requires frequent indexing operations
* Can access columns, rows, subsets of columns and rows, and specific items

### 3.1. Selecting a `DataFrame`’s Columns 
* Use **DataFrame_Name[ColumnName]** notation, like lists and Series
* Displays the selected column as a `Series`

* Let's retrieve Herminone's grades:

In [None]:
grades["Hermione"]

* If a `DataFrame`’s column-name strings are valid Python identifiers, you can use them as attributes

In [None]:
grades.Hermione

### 3.2. Selecting a `DataFrame`’s Rows
* Pandas provides two optimized attributes to access `DataFrame`s rows
* Can access a row by its label or index number

#### 3.2.1. Access by Label using loc attribute
* Access a row by its label via the `DataFrame`’s **`loc` attribute**
* Use **DataFrame_Name.loc[RowName]** notation
* Returns the row as a Series

* Let's retrieve all the grades for Potion1

In [None]:
grades.loc["Potion1"]

#### 3.2.2. Access by Integer Indices using `iloc` attribute
* Access rows by integer zero-based indices using the **`iloc` attribute** 
    * (the `i` in `iloc` means that it’s used with integer indices)
* Use **DataFrame_Name.iloc[RowIndex]** notation
* Returns the row as a Series

In [None]:
grades.iloc[0]

#### 3.2.3. Selecting Specific Rows
* Specify the rows with a list

In [None]:
#returns Potion1 and Potion3 rows only
grades.loc[['Potion1', 'Potion3']]

In [None]:
#returns Potion1 and Potion3 rows only
grades.iloc[[0,2]]

<a id="4"></a>

## 4. Slicing DataFrames
* Can slice DataFrames using the **[:]** notation

### 4.1.  Selecting Rows via Slices

#### 4.1.1. Slicing with Row Labels
* Use **DataFrame.loc[start:end]** notation

In [None]:
#returns all columns with the specified slice of rows
grades.loc["Potion1":"Potion2"]

* When using slices containing **labels** with `loc`, the range specified **includes** the high index (`'Potion2'`):

#### 4.1.2. Slicing with Row Indices
* Use **DataFrame.iloc[start:end]** notation

In [None]:
grades.iloc[0:2]

* When using slices containing **integer indices** with `iloc`, the range you specify **excludes** the high index (`2`):

### 4.2. Selecting Subsets of the Rows and Columns 
* Use **DataFrame.loc[RowLabels, ColumnLabels]** or **DataFrame.iloc[RowIndices, ColumnIndices]** notation
* When selecting specific rows or columns, provide a **list** of rows or columns in brackets []
* When selecting a range of rows or columns, use the slicing notation **'[:]'**.

#### 4.2.1. Selecting specific rows with a list of specific columns
* Let's retrieve only `Harry`’s and `Ginny`’s grades for `Potion1` and `Potion2`

In [None]:
grades

In [None]:
grades.loc[["Potion1","Potion2"], ["Harry", "Ginny"]]

In [None]:
grades.iloc[[0,1],[0,3]]

#### 4.2.2. Selecting a range of rows with a list of specific columns

In [None]:
grades.loc["Potion1":"Potion3", ["Harry", "Ginny"]]

In [None]:
grades.iloc[0:3, [0,3]]

#### 4.2.3. Selecting specific rows with a range of specific columns
* Let's get the grades for Harry, Hermione, and Ron on Potion1 and Potion3

In [None]:
grades

In [None]:
grades.loc[["Potion1","Potion3"], "Harry":"Ron"]

In [None]:
grades.iloc[[0,2], 0:3]

#### 4.2.4. Selecting a range of rows with a range of columns
* Let's retrieve grades for Herminone through Ginny for Potions 1 through 3

In [None]:
grades.loc["Potion1":"Potion3", "Hermione":"Ginny"]

In [None]:
grades.iloc[0:3, 1:4]

<a id="5"></a>

## 5. DataFrame Calculation Methods
* pandas provides various methods, such as min(), max(), mean(), sum(), describe(), to calculate descriptive statistics.
* Statistics are calculated by column

In [None]:
grades.describe()

* Can control the precision and other default settings with pandas’ **`set_option` function**

In [None]:
pd.set_option('precision', 2)

In [None]:
grades.describe()

* Can calculate that for each student simply by calling `mean` on the `DataFrame`

In [None]:
grades.mean()

In [None]:
grades.min()

In [None]:
grades.max()

In [None]:
grades.sum()

### 5.1. Transposing the `DataFrame` with the `T` Attribute
* Can quickly **transpose** rows and columns
* The rows become the columns, and the columns become the rows

In [None]:
grades.T

* Assume that rather than getting the average grade by student, you want to get them by potion
* Call **`mean()`** on `grades.T`

In [None]:
grades.T.mean()

<a id="6"></a>

## 6. Accessing a Specific `DataFrame` Cell
* `DataFrame` method **`at`** and **`iat`** attributes get a single value from a `DataFrame`

In [None]:
grades

In [None]:
grades.at["Potion3", "Ron"]

In [None]:
grades.iat[2,2]

### 6.1. Updating Cell Values
* Can assign new values to specific elements

In [None]:
grades.at["Potion3", "Ron"] = 90

In [None]:
grades

In [None]:
grades.iat[2,2] = 85

In [None]:
grades

### 6.2. Updating Columns
* Select the column and provide a list of new values.
* The list must match the size of the column (# of rows)

In [None]:
grades["Hermione"] = [100, 100, 100]

In [None]:
grades

### 6.3. Updating Rows
* Select the row and provide a list of new values.
* The list must match the size of the row (# of columns)

In [None]:
grades.loc["Potion3"] = [0,0,0,0]

In [None]:
grades