## Overview

This notebook reviews the core concepts of:
* DataFrames
* Series
* Dtypes

To use you need to have python installed and jupyterlab.  </br>
The code assumes you have a basic familiarity with python syntax and use.
## Packages Needed
* sys
* pandas
* numpy

In [135]:
## Install & Import
import sys

!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas

'''
Using "!{sys.executable} -m pip install"   instead of "!pip install"
ensures that the install is done in the context and kernel currently running
the notebook. This is a recommended best practice and I try to use this method within
notebooks as I try to default to what I would want to see if I was collaborating with
a group.
'''

import numpy as np
import pandas as pd



## Series
Series are labeled arrays and the building block to
DataFrame rows and columns.

In [136]:
empty_float_series = pd.Series(data=None, index=None, dtype=np.float64, name="Float Series")
empty_float_series.shape

(0,)

In [137]:
empty_float_series.info()

<class 'pandas.core.series.Series'>
Index: 0 entries
Series name: Float Series
Non-Null Count  Dtype  
--------------  -----  
0 non-null      float64
dtypes: float64(1)
memory usage: 0.0+ bytes


In [138]:
data = [0.0, 1.1, 2.2, 3.3]
float_series = pd.Series(data=data, index=None, dtype=np.float64, name="Float Series")
float_series.shape

(4,)

In [139]:
float_series.info()

<class 'pandas.core.series.Series'>
RangeIndex: 4 entries, 0 to 3
Series name: Float Series
Non-Null Count  Dtype  
--------------  -----  
4 non-null      float64
dtypes: float64(1)
memory usage: 160.0 bytes


In [140]:
# you can easily make a change to every value through
# functions like add
float_series.add(5.5)

0    5.5
1    6.6
2    7.7
3    8.8
Name: Float Series, dtype: float64

In [141]:
# but this does not edit the underlying stored values
# unless it is specifically assigned back to the same variable name
float_series.head()

0    0.0
1    1.1
2    2.2
3    3.3
Name: Float Series, dtype: float64

In [142]:
float_series = float_series.add(5.5)
float_series.head()

0    5.5
1    6.6
2    7.7
3    8.8
Name: Float Series, dtype: float64

In [143]:
# the data parameter can accept several types not just list/array data
data_dict = {"bob":1.1, "jane":2.2, "lance":3.3}
dict_series = pd.Series(data=data_dict, index=None, dtype=np.float64, name="Dictionary Series")
dict_series.head()

bob      1.1
jane     2.2
lance    3.3
Name: Dictionary Series, dtype: float64

In [144]:
# notice the dictionary keys became the index
# This now allows use like
dict_series["bob"]

1.1

In [145]:
# you can also use numpy ndarrays
np_rand_data = np.random.randint(0,1000,100)
random_series = pd.Series(data=np_rand_data)
random_series.head()

0     56
1     42
2    426
3      2
4    548
dtype: int64

In [146]:
random_series.info()

<class 'pandas.core.series.Series'>
RangeIndex: 100 entries, 0 to 99
Series name: None
Non-Null Count  Dtype
--------------  -----
100 non-null    int64
dtypes: int64(1)
memory usage: 928.0 bytes


In [147]:
# you can also set a single value with a given index like

fours_series = pd.Series(data=4., index=["A", "B", "C", "D"])
fours_series.head()

A    4.0
B    4.0
C    4.0
D    4.0
dtype: float64

In [148]:
# to add to a series use pd.concat()
not_fours = pd.Series(data=5., index=["E", "F"])
fours_series = pd.concat([fours_series, not_fours])
fours_series.tail()

B    4.0
C    4.0
D    4.0
E    5.0
F    5.0
dtype: float64

## Dtypes
Pandas support the following Dtypes:
* object - text, str, or mixed formats
* int64 - integers
* flout64 - floating point numbers
* bool - True False
* datetime64 - date and time values
* timedelta[ns] - difference between two datetimes
* category - a fixed finite list of string values

In [149]:
# Why dtypes matter.. because there is specific api and functions
# to each types that can be very very useful.

# For example:
obj_series = pd.Series(data=["1.1", "2.2", "3.3"], dtype=str)
print(obj_series.sum())
print("verus")
print(dict_series.sum())

1.12.23.3
verus
6.6


In [150]:
# you need to watch for unintended conversions
not_floats = pd.Series(data="5", index=["G", "H"])
fours_series = pd.concat([fours_series, not_floats])
# Will convert from float64 to object
fours_series.info()

<class 'pandas.core.series.Series'>
Index: 8 entries, A to H
Series name: None
Non-Null Count  Dtype 
--------------  ----- 
8 non-null      object
dtypes: object(1)
memory usage: 128.0+ bytes


## DataFrames

In [151]:
first_df = pd.DataFrame(data=dict_series)
first_df.head()

Unnamed: 0,Dictionary Series
bob,1.1
jane,2.2
lance,3.3


In [152]:
first_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, bob to lance
Data columns (total 1 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Dictionary Series  3 non-null      float64
dtypes: float64(1)
memory usage: 156.0+ bytes


In [153]:
first_df["obj_nums"] = ["1.1", "2.2", "3.3"]
first_df.head()

Unnamed: 0,Dictionary Series,obj_nums
bob,1.1,1.1
jane,2.2,2.2
lance,3.3,3.3


In [154]:
first_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, bob to lance
Data columns (total 2 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Dictionary Series  3 non-null      float64
 1   obj_nums           3 non-null      object 
dtypes: float64(1), object(1)
memory usage: 180.0+ bytes


In [155]:
obj_series.info()

<class 'pandas.core.series.Series'>
RangeIndex: 3 entries, 0 to 2
Series name: None
Non-Null Count  Dtype 
--------------  ----- 
3 non-null      object
dtypes: object(1)
memory usage: 152.0+ bytes


In [156]:
first_df["C"] = obj_series
first_df.head()

Unnamed: 0,Dictionary Series,obj_nums,C
bob,1.1,1.1,
jane,2.2,2.2,
lance,3.3,3.3,


In [157]:
first_df["C"] = obj_series.values
first_df.head()

Unnamed: 0,Dictionary Series,obj_nums,C
bob,1.1,1.1,1.1
jane,2.2,2.2,2.2
lance,3.3,3.3,3.3


In [158]:
first_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, bob to lance
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Dictionary Series  3 non-null      float64
 1   obj_nums           3 non-null      object 
 2   C                  3 non-null      object 
dtypes: float64(1), object(2)
memory usage: 204.0+ bytes
