## Overview

This notebook reviews the core concepts of:
* DataFrames
* Series
* Dtypes

To use you need to have python installed and jupyterlab.  </br>
The code assumes you have a basic familiarity with python syntax and use.
## Packages Needed
* sys
* pandas
* numpy

## Install & Import

In [None]:

import sys

!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas

'''
Using "!{sys.executable} -m pip install"   instead of "!pip install"
ensures that the install is done in the context and kernel currently running
the notebook. This is a recommended best practice and I try to use this method within
notebooks as I try to default to what I would want to see if I was collaborating with
a group.
'''

import numpy as np
import pandas as pd

## Series
Series are labeled arrays and the building block to DataFrame rows and columns.

In [None]:
empty_float_series = pd.Series(data=None, index=None, dtype=np.float64, name="Float Series")
empty_float_series.shape

In [None]:
empty_float_series.info()

In [None]:
data = [0.0, 1.1, 2.2, 3.3]
float_series = pd.Series(data=data, index=[0,1,2,3], dtype=np.float64, name="Float Series")
float_series.shape

In [None]:
float_series.info()

In [None]:
# you can easily make a change to every value through
# functions like add
float_series.add(5.5)

In [None]:
# but this does not edit the underlying stored values
# unless it is specifically assigned back to the same variable name
float_series.head()

In [None]:
float_series = float_series.add(5.5)
float_series.head()

In [None]:
# the data parameter can accept several types not just list/array data
data_dict = {"bob":1.1, "jane":2.2, "lance":3.3}
dict_series = pd.Series(data=data_dict, index=None, dtype=np.float64, name="Dictionary Series")
dict_series.head()

In [None]:
# notice the dictionary keys became the index
# This now allows use like
dict_series["bob"]

In [None]:
# you can also use numpy ndarrays
np_rand_data = np.random.randint(0,1000,100)
random_series = pd.Series(data=np_rand_data)
random_series.head()

In [None]:
random_series.info()

In [None]:
# you can also set a single value with a given index like

fours_series = pd.Series(data=4., index=["A", "B", "C", "D"])
fours_series.head()

In [None]:
# to add to a series use pd.concat()
not_fours = pd.Series(data=5., index=["E", "F"])
fours_series = pd.concat([fours_series, not_fours])
fours_series.tail()

## Dtypes
Pandas support the following Dtypes:
* object - text, str, or mixed formats
* int64 - integers
* flout64 - floating point numbers
* bool - True False
* datetime64 - date and time values
* timedelta[ns] - difference between two datetimes
* category - a fixed finite list of string values

In [None]:
# Why dtypes matter.. because there is specific api and functions
# to each types that can be very very useful.

# For example:
obj_series = pd.Series(data=["1.1", "2.2", "3.3"], dtype=str)
print(obj_series.sum())
print("verus")
print(dict_series.sum())

In [None]:
# you need to watch for unintended conversions
not_floats = pd.Series(data="5", index=["G", "H"])
fours_series = pd.concat([fours_series, not_floats])
# Will convert from float64 to object
fours_series.info()

## DataFrames

In [None]:
first_df = pd.DataFrame(data=dict_series)
first_df.head()

In [None]:
first_df.info()

In [None]:
first_df["obj_nums"] = ["1.1", "2.2", "3.3"]
first_df.head()

In [None]:
first_df.info()

In [None]:
obj_series.info()

In [None]:
first_df["C"] = obj_series
# wont work because it not just a list/array
first_df.head()

In [None]:
first_df["C"] = obj_series.values # needs to be unpacked
first_df.head()

In [None]:
first_df.info()