## Overview
The core data types suppoted in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object.  Defualt types are int64 and float64 for integers and floats respectively. This notebook reviews the core concepts of:
* text to numeric
* numeric to text
* float to int
* string (object) to boolean
* int to boolean
* datetimes





To use you need to have python installed and jupyterlab.  </br>
The code assumes you have a basic familiarity with python syntax and use. It also assumes you are familiar with head, tail, and info functions for series and dataframes covered in earlier lessons.
## Packages Needed
* sys
* pandas
* numpy

In [None]:
import sys

!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas

'''
Using "!{sys.executable} -m pip install"   instead of "!pip install"
ensures that the install is done in the context and kernel currently running
the notebook. This is a recommended best practice and I try to use this method within
notebooks as I try to default to what I would want to see if I was collaborating with
a group.
'''

import numpy as np
import pandas as pd

## String and Numerics

In [None]:

text_num = "42"
print(type(int(text_num)))
print(type(pd.to_numeric(text_num)))

In [None]:
text_num = "42.0"
print(type(float(text_num)))
print(type(pd.to_numeric(text_num)))

In [None]:
text_num = "42."
print(type(float(text_num)))
print(type(pd.to_numeric(text_num)))

In [None]:
text_num_series = pd.Series(data=["42", "8675309", "100"])
new_series = pd.to_numeric(text_num_series)
new_series.info

In [None]:
text_num_series = pd.Series(data=["42.0", "8675309", "100"])
new_series = pd.to_numeric(text_num_series)
new_series.info

In [None]:
# but we cant do this:
new_series = float(text_num_series) #will fail

In [None]:
text_num_df = pd.DataFrame(data= text_num_series, columns=["text"])
text_num_df.info()

In [None]:
text_num_df["nums"] = pd.to_numeric(tex_num_df["text"])
text_num_df.head()

In [None]:
text_num_df.info()

In [None]:
### astype

In [None]:
# to be specific on type conversion we would use astype... but it may be confusing
# for example based on our previous examples it would seem like both of these would
# work but it wont
text_num_df["floats"] = text_num_df["text"].astype("float") # works
text_num_df["ints"] = text_num_df["text"].astype("int") # wont work because of a float string

In [None]:
# but this will
text_num_df["ints"] = pd.to_numeric(tex_num_df["text"]).astype("int")
text_num_df.head()


In [None]:
text_num_df.info()

In [None]:
# going numberic to a string is easy peasy
text_num_df["string"] = text_num_df["floats"].astype("str")
text_num_df.head()

In [None]:
text_num_df.info()

In [None]:
#for an existing columns there are two options:
text_num_df2 = pd.DataFrame(data= text_num_series, columns=["text"])
text_num_df2["text"] = text_num_df2["text"].astype('float')
text_num_df2 = text_num_df2.astype({"text":'float'}) # my recommended
text_num_df2.info()

In [None]:
## Booleans
booly_things = ["true", "false", "True", "false", True, False,  1,   0, 42, -1, "t", "f", 1.0, 0.0]
booly_df = pd.DataFrame(data=booly_things, columns=["x"])
booly_df.info()

In [None]:
booly_df["y"] = booly_df["x"].astype("bool")
booly_df.head(20)

In [None]:
booly_df.loc[booly_df["y"] == False]

In [None]:
# so what if we only have strings... or something weird
booly_things2 = ["True", "False", "Weird", "Science"]
booly_df2 = pd.DataFrame(data=booly_things2, columns=["x"])
booly_df2["y"] = booly_df2["x"].map({'True': True, 'False': False, 'Weird':True})
# unmapped values will become null/NaN values
booly_df2.head()

In [None]:
booly_df.info()

## Datetimes

In [None]:
dt_df = pd.DataFrame(data=["01/15/23", "01/15/23 12:12:12"], columns=["str_dt"])
dt_df.head()

In [None]:
dt_df["dt64"] = pd.to_datetime(dt_df["str_dt"], infer_datetime_format=True)
dt_df.head()

In [None]:
dt_df.info()

### more on datetimes in a later stand alone lesson

## The End