## Overview
This notebook starts at the very basics of pandas and moves forward. <br />
To use you need to have python installed and jupyterlab. <br />
The code assumes you have a basic familiarity with python syntax and use.
## Packages Needed
* sys
* pandas
* numpy

## Install & Import

In [110]:
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas

'''
Using "!{sys.executable} -m pip install"   instead of "!pip install"
ensures that the install is done in the context and kernel currently running
the notebook. This is a recommended best practice and I try to use this method within
notebooks as I try to default to what I would want to see if I was collaborating with
a group.
'''

import numpy as np
import pandas as pd



## Creating Simple Data

### Empty DataFrame

In [139]:
empty_df = pd.DataFrame()
'''
I encourage  giving your dataframe a verbose name.
Using just df can lead to collisions or confusion when collaborating
on larger bodies of code. But it is helpful to use a _df suffix.
'''
empty_df

### From a list

In [140]:
sample_list = [1,2,3,4,5,6]
'''
A list of values can be passed direclty in without any
additional parameters or flags
'''
list_df = pd.DataFrame(sample_list)
list_df

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5
5,6


In [141]:
'''
To specify column names you would pass a list of
strings into the columns parameter
'''
list_df2 = pd.DataFrame(sample_list, columns=["ints"])
list_df2

Unnamed: 0,ints
0,1
1,2
2,3
3,4
4,5
5,6


### From a dict

In [142]:
sample_dict = {"ints":sample_list}
'''
A Basic simple dictionary can also be passed in directly
The key names will become column names.
'''
dict_df = pd.DataFrame(sample_dict)
dict_df

Unnamed: 0,ints
0,1
1,2
2,3
3,4
4,5
5,6


In [121]:
alphas_list = ["A", "B", "C", "D"]
sample_dict = {"ints":sample_list, "alphas": alphas_list}
'''
A multiple key dictionary can also be used but the lenghts of
values must be consistent
'''
dict_df2 = pd.DataFrame(sample_dict)
# this will faile because the value lists are not the same length.
dict_df2

ValueError: All arrays must be of the same length

In [122]:
alphas_list2 = ["A", "B", "C", "D", "E", "F"] # correct length
sample_dict2 = {"ints":sample_list, "alphas": alphas_list2}
dict_df3 = pd.DataFrame(sample_dict2)
dict_df3

Unnamed: 0,ints,alphas
0,1,A
1,2,B
2,3,C
3,4,D
4,5,E
5,6,F


### From the numpy zeros function

In [144]:
'''
numpy zeros is useful for creating empty datasets.
To use you pass in the rows and columns as (rows,columns)
to np.zeros()
'''
zeros_matrix = np.zeros((4,3))
zeros_matrix

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [145]:
'''
This output of np.zeros can be passed in with or without
columns names
'''
zeros_df = pd.DataFrame(np.zeros((3,3)), columns=["a1", "b2", "c3"])
zeros_df

Unnamed: 0,a1,b2,c3
0,0.0,0.0,0.0
1,0.0,0.0,0.0
2,0.0,0.0,0.0


### From a series

In [146]:
'''
A series is a 1 dimensional structure similar to a
python list but with a broader API
'''
empty_series = pd.Series()
empty_series

  empty_series = pd.Series()


Series([], dtype: float64)

In [147]:
# standard python list
print(type(sample_list))
sample_list

<class 'list'>


[1, 2, 3, 4, 5, 6]

In [148]:
sample_series = pd.Series(sample_list)
print(type(sample_series))
sample_series

<class 'pandas.core.series.Series'>


0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

In [149]:
'''
Unlike a list, a series can have a unique index set
that can be used for selection
'''
alpha_indexed_series = pd.Series(sample_list, alphas_list2)
print(type(alpha_indexed_series))
alpha_indexed_series

<class 'pandas.core.series.Series'>


A    1
B    2
C    3
D    4
E    5
F    6
dtype: int64

In [150]:
alpha_indexed_series["D"]

4

## Using quick looks/descriptors

### Head

In [90]:
# head
dict_df3.head()

Unnamed: 0,ints,alphas
0,1,A
1,2,B
2,3,C
3,4,D
4,5,E


In [151]:
dict_df3.head(3)


Unnamed: 0,ints,alphas
0,1,A
1,2,B
2,3,C


In [152]:
dict_df3.head(10)


Unnamed: 0,ints,alphas
0,1,A
1,2,B
2,3,C
3,4,D
4,5,E
5,6,F


### Tail

In [91]:
dict_df3.tail()

Unnamed: 0,ints,alphas
1,2,B
2,3,C
3,4,D
4,5,E
5,6,F


In [156]:
dict_df3.tail(1)

Unnamed: 0,ints,alphas
5,6,F


### Shape

In [157]:
dict_df3.shape

(6, 2)

### info

In [158]:
dict_df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   ints    6 non-null      int64 
 1   alphas  6 non-null      object
dtypes: int64(1), object(1)
memory usage: 224.0+ bytes


## Basic Selection Methods

### By Column Names

In [163]:
# for a single column it can be passed in by name
dict_df3["ints"]

0    1
1    2
2    3
3    4
4    5
5    6
Name: ints, dtype: int64

In [164]:
# to select multiple columns you need to pass them in as a list
dict_df3[["ints", "alphas"]]

Unnamed: 0,ints,alphas
0,1,A
1,2,B
2,3,C
3,4,D
4,5,E
5,6,F


In [165]:
# to be safe you can just always use the double brackets
dict_df3[["ints"]]

Unnamed: 0,ints
0,1
1,2
2,3
3,4
4,5
5,6


In [167]:
# but you cannot pass in a column index via this method
# this requires the iloc method below
dict_df3[0]

KeyError: 0

### Loc

In [175]:
# loc will select rows by the index
dict_df3.loc[1]

KeyError: 1

In [176]:
# Python Style Slices work as well
dict_df3.loc[3:]


TypeError: cannot do slice indexing on Index with these indexers [3] of type int

In [177]:
dict_df3.loc[3:5]


TypeError: cannot do slice indexing on Index with these indexers [3] of type int

In [178]:
dict_df3.loc[:3]


TypeError: cannot do slice indexing on Index with these indexers [3] of type int

In [179]:
# If we set the index to a non-numeric they can be used as well
dict_df3 = dict_df3.set_index("alphas")

KeyError: "None of ['alphas'] are in the columns"

In [180]:
dict_df3

Unnamed: 0_level_0,ints
alphas,Unnamed: 1_level_1
A,1
B,2
C,3
D,4
E,5
F,6


In [181]:
dict_df3.loc["A"]

ints    1
Name: A, dtype: int64

### iloc

In [184]:
'''
to select a column by index the iloc methods is need
I personally try to avoid this preferring by name
as column location can shift in real world data
'''
zeros_df

Unnamed: 0,a1,b2,c3
0,0.0,0.0,0.0
1,0.0,0.0,0.0
2,0.0,0.0,0.0


In [185]:
zeros_df.iloc[:,1]

0    0.0
1    0.0
2    0.0
Name: b2, dtype: float64

In [186]:
# selecting multiple column by passing a list of positions
zeros_df.iloc[:,[1,2]]

Unnamed: 0,b2,c3
0,0.0,0.0
1,0.0,0.0
2,0.0,0.0


In [187]:
# selecting to a row
zeros_df.iloc[2:,[1,2]]

Unnamed: 0,b2,c3
2,0.0,0.0


In [188]:
# selecting a single value
zeros_df.iloc[2,1]

0.0

In [191]:
dict_df

Unnamed: 0,ints
0,1
1,2
2,3
3,4
4,5
5,6


In [192]:
dict_df.iloc[3,0]

4

## The end