## S14a: Lab 2 - Pandas

Pandas plays nicely with Numpy, making a nice transition from previous work with n-shaped arrays. In this notebook we will be two structures for combining data: Series and DataFrames.

In this notebook, we are offering just a glimpse into Pandas before moving on to application in an example. It's worth learning more outside of lab: if you've only got 10min, go [here](https://pandas.pydata.org/docs/getting_started/10min.html); otherwise check out the learning [options](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html). Much of this can be learned piecemeal through practical experience cleaning and pre-processing data.

### Building a data structure: starting with [Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html?highlight=series#pandas.Series)

In [53]:
# First import libraries

import numpy as np
import pandas as pd

In [54]:
# Scenario 1: Rustle up some simple key-value mock data
keys = ['rad', 'bad', 'sad', 'mad', 'fad']
values = np.random.randint(1, 11, len(keys))

# Connect with pandas Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
data_auto_i = pd.Series(values)
print('Auto increment:')
print(data_auto_i)
data_manu_i = pd.Series(values, keys)
print('\nManual increment:')
print(data_manu_i)

Auto increment:
0    1
1    2
2    4
3    7
4    7
dtype: int64

Manual increment:
rad    1
bad    2
sad    4
mad    7
fad    7
dtype: int64


In [55]:
# Find particular value by index/key

print(data_auto_i[2])
print(data_manu_i['sad'])

4
4


In [56]:
# Scenario 2: Rustle up data using dictionaries

arr_dicts = [
    {'name': 'Zona', 'color': 'red', 'intensity': np.random.randint(0, 256)},
    {'name': 'Aleksander', 'color': 'green', 'intensity': np.random.randint(0, 256)},
    {'name': 'Fred', 'color': 'blue', 'intensity': np.random.randint(0, 256)},
    {'name': 'Brian', 'color': 'cyan', 'intensity': np.random.randint(0, 256)},
    {'name': 'Jared', 'color': 'magenta', 'intensity:': np.random.randint(0, 256)}
]
teamdata1 = pd.Series(arr_dicts)
teamdata1

0    {'name': 'Zona', 'color': 'red', 'intensity': ...
1    {'name': 'Aleksander', 'color': 'green', 'inte...
2    {'name': 'Fred', 'color': 'blue', 'intensity':...
3    {'name': 'Brian', 'color': 'cyan', 'intensity'...
4    {'name': 'Jared', 'color': 'magenta', 'intensi...
dtype: object

### Excel-esque with [Dataframes](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html?highlight=dataframe#pandas.DataFrame)

In [57]:
# Some extractions
rows = np.array(['1', '2', '3'])
cols = np.array(['a', 'b', 'c', 'd', 'e'])

# To random Dataframe
dataframe = pd.DataFrame(np.random.randint(1, 101, (len(rows), len(cols))), rows, cols)
dataframe

Unnamed: 0,a,b,c,d,e
1,95,17,49,91,19
2,2,25,40,88,20
3,58,30,80,37,17


In [58]:
# Grab a col

print('COL:')
print(dataframe['c'])

# Grab a value

print('\nPOS:')
print(dataframe['c']['2'])

COL:
1    49
2    40
3    80
Name: c, dtype: int64

POS:
40


In [59]:
# !!!YOUR TURN!!!

# Slicing! Grab the first 2 rows and first 2 cols from dataframe
print(dataframe[['a', 'b']][0:2])


    a   b
1  95  17
2   2  25


### Change gears

There is alot to learn about pandas - but you will learn more by example in the next notebook.