## Overview
This notebook starts at the very basics of pandas and moves forward. <br />
To use you need to have python installed and jupyterlab. <br />
The code assumes you have a basic familiarity with python syntax and use.
## Packages Needed
* sys
* pandas
* numpy

## Install & Import

In [None]:
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas

'''
Using "!{sys.executable} -m pip install"   instead of "!pip install"
ensures that the install is done in the context and kernel currently running
the notebook. This is a recommended best practice and I try to use this method within
notebooks as I try to default to what I would want to see if I was collaborating with
a group.
'''

import numpy as np
import pandas as pd

## Creating Simple Data

### Empty DataFrame

In [None]:
empty_df = pd.DataFrame()
'''
I encourage  giving your dataframe a verbose name.
Using just df can lead to collisions or confusion when collaborating
on larger bodies of code. But it is helpful to use a _df suffix.
'''
empty_df

### From a list

In [None]:
sample_list = [1,2,3,4,5,6]
'''
A list of values can be passed direclty in without any
additional parameters or flags
'''
list_df = pd.DataFrame(sample_list)
list_df

In [None]:
'''
To specify column names you would pass a list of
strings into the columns parameter
'''
list_df2 = pd.DataFrame(sample_list, columns=["ints"])
list_df2

### From a dict

In [None]:
sample_dict = {"ints":sample_list}
'''
A Basic simple dictionary can also be passed in directly
The key names will become column names.
'''
dict_df = pd.DataFrame(sample_dict)
dict_df

In [None]:
alphas_list = ["A", "B", "C", "D"]
sample_dict = {"ints":sample_list, "alphas": alphas_list}
'''
A multiple key dictionary can also be used but the lenghts of
values must be consistent
'''
dict_df2 = pd.DataFrame(sample_dict)
# this will faile because the value lists are not the same length.
dict_df2

In [None]:
alphas_list2 = ["A", "B", "C", "D", "E", "F"] # correct length
sample_dict2 = {"ints":sample_list, "alphas": alphas_list2}
dict_df3 = pd.DataFrame(sample_dict2)
dict_df3

### From the numpy zeros function

In [None]:
'''
numpy zeros is useful for creating empty datasets.
To use you pass in the rows and columns as (rows,columns)
to np.zeros()
'''
zeros_matrix = np.zeros((4,3))
zeros_matrix

In [None]:
'''
This output of np.zeros can be passed in with or without
columns names
'''
zeros_df = pd.DataFrame(np.zeros((3,3)), columns=["a1", "b2", "c3"])
zeros_df

### From a series

In [None]:
'''
A series is a 1 dimensional structure similar to a
python list but with a broader API
'''
empty_series = pd.Series()
empty_series

In [None]:
# standard python list
print(type(sample_list))
sample_list

In [None]:
sample_series = pd.Series(sample_list)
print(type(sample_series))
sample_series

In [None]:
'''
Unlike a list, a series can have a unique index set
that can be used for selection
'''
alpha_indexed_series = pd.Series(sample_list, alphas_list2)
print(type(alpha_indexed_series))
alpha_indexed_series

In [None]:
alpha_indexed_series["D"]

## Using quick looks/descriptors

### Head

In [None]:
# head
dict_df3.head()

In [None]:
dict_df3.head(3)


In [None]:
dict_df3.head(10)


### Tail

In [None]:
dict_df3.tail()

In [None]:
dict_df3.tail(1)

### Shape

In [None]:
dict_df3.shape

### info

In [None]:
dict_df3.info()

## Basic Selection Methods

### By Column Names

In [None]:
# for a single column it can be passed in by name
dict_df3["ints"]

In [None]:
# to select multiple columns you need to pass them in as a list
dict_df3[["ints", "alphas"]]

In [None]:
# to be safe you can just always use the double brackets
dict_df3[["ints"]]

In [None]:
# but you cannot pass in a column index via this method
# this requires the iloc method below
dict_df3[0]

### Loc

In [None]:
# loc will select rows by the index
dict_df3.loc[1]

In [None]:
# Python Style Slices work as well
dict_df3.loc[3:]


In [None]:
dict_df3.loc[3:5]


In [None]:
dict_df3.loc[:3]


In [None]:
# If we set the index to a non-numeric they can be used as well
dict_df3 = dict_df3.set_index("alphas")

In [None]:
dict_df3

In [None]:
dict_df3.loc["A"]

### iloc

In [None]:
'''
to select a column by index the iloc methods is need
I personally try to avoid this preferring by name
as column location can shift in real world data
'''
zeros_df

In [None]:
zeros_df.iloc[:,1]

In [None]:
# selecting multiple column by passing a list of positions
zeros_df.iloc[:,[1,2]]

In [None]:
# selecting to a row
zeros_df.iloc[2:,[1,2]]

In [None]:
# selecting a single value
zeros_df.iloc[2,1]

In [None]:
dict_df

In [None]:
dict_df.iloc[3,0]

## The end