# Python Libraries
One of the most powerful features of Python is the ability to extend the baseline functionality with libraries. The library is a collection of methods that are often surrounding a particular feature or objective. There are libraries for mathematical functions `math`, scientific python `scipy`, dataframe libraries `pandas`, numerical handling and matrices `numpy`, plotting libraries for creating graphs `matplotlib` and other specialized libraries.

To gain use of a library it must first be installed. Python has several ways to install a library but the most comom is the `pip install` command followed by the library name. This action only needs to be performed once and your code environment will have access to those methods.

## `pip install` Library Method
The pip is an acronym for Python Install Packages. When combined with the install command it will install the library called directly following. For instance, to install the math library named `math`, we would type `pip install math` and execute the cell. Python will find the module for the installer and begin unpacking the files, installing the library contents, and makes them available for use. It is important to note, the pip install should be run in a cell alone and without markup. The markup causes the pip install to fail and combining multiple pip installs can be okay but you will miss some of the essential messages at the end when it rapidly moves on to the next install.

Pay attention to the information it states at the end of the install. Some libraries require you to restart your kernel before the packages are available for use. In most IDEs you would select Runtime from the file menu and select Restart Kernel. When you choose to restart the kernel any variables and their values will be discarded so those cells would need to be run again. Typically we install all the libraries in a separate notebook so that our working notebook is not cluttered with the installer data.

## Let's install some libraries.
We will begin with the numpy. Numpy is Numerical Python and is used to restructure data. Many of the other libraries requires numpy as an underlying framework to perform their advanced calculations as it is a powerful way to manipulate large quantities of numbers with speed.

In [2]:
pip install numpy

Note: you may need to restart the kernel to use updated packages.


Next we will move on to Pandas. This library is used to create and manipulate dataframes. So far we have seen data in lists and dictionaries. Pandas answers the question of what would we do if we had hundreds of lists, each with hundreds of thousands (or many more) values but we wanted to see them all in one place. Pandas allows you to see the data in a dataframe or an organized representation of the data that can be manipulated and changed to suit the need of your analysis. Data Science relies heavily on Numpy and Pandas as the underlying framework for all of our analyses. Even our machine learning models require Numpy and Pandas to perform their work.

In [4]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


## Visualization Libraries
The next library we will use the most in our analyses are libraries for generating graphs or visualizations. Each of these visualization libraries has a different strength and they are frequently used for different applications.

### Matplotlib
Matplotlib runs on the pyplot framework so the foundation of all of our visualization libraries is pyplot. The most basic is matplotlib which generates a simple two-dimensional graph that is static. The user cannot interact with the image it renders, it is a snapshot of the data in that exact moment. If the data changed the graph would need to be run again to see the new changes. While it is basic there are many options to how the plots are rendered. The data scientist has control over the colors, the grid, the size of the plot, the size of the data in the plot, the scaling, axes, titles, and legend.

In [6]:
pip install matplotlib

Note: you may need to restart the kernel to use updated packages.


### Seaborn
Seaborn is the level up to matplotlib. While still a static graph, seaborn expands on the type of graphs that are available to include more complexity and can even render three dimensional graphs in a two dimensional space. The user cannot interact with these but the options for seaborn are only limited by the skill of the programmer. Seaborn can use color gradients to identify small changes in data not available in matplotlib. There are advanced statistics graphs in seaborn such as heatmaps that allow you to visualize correlation between variables and determine relationships.

**(Always remember correlation does not imply or result in causation, just a relatedness between variables).**

A favorite feature of seaborn is the graph tiling. You can introduce complex graphs all in one render so that you can see six or more representations of the data by calling a single type of plot.

In [8]:
pip install seaborn

Note: you may need to restart the kernel to use updated packages.


The advanced libraries allow a user to manipulate the scaling, isolate specific variables, change the axes, manipulate the colors, and even generate three dimensional plots the user can spin around in many different directions to see insights that may have been hidden when the data was flattened. While these sound amazing (and they are) a data scientist must consider the size of their data and how much computing power it takes to generate those plots. We call this **Computational Cost** which factors in how powerful the computer must be to make the plots, how much electricity it takes to churn through the data to create them, and how much power the user needs in their device to see the visualization. There are many more visualization libraries to choose from and more created every month. A few more to investigate for very advanced plots are:
* *Bokeh*, a javascript based python library that allows the charts to be embedded in a web page and adds dynamic capability.
* *Plotly*, a javascript based python library that is used to generate three dimensional plots, interactive plots, and opens up even more options than seaborn for how to display the data. The charts are rendered in the coding environment and features an exploded view, where clicking on a specific data point, bar, or pie slice will expand the data to the full chart window allowing deeper insights into data. The user can draw a box around data points and isolate a region in the graph in the exploded view to investigate outliers. The three dimensional plots spin on axis and can be rotated through the x, y, and z axis so that the data can be seen in any angle.
* *Flourish*, this is a rising data science library that adds functionality through new graph types almost weekly. It is not unheard of to be working in Flourish and have a new chart type added while you are working. This library combines the features of seaborn, plotly, and bokeh with an open-source feel. This means if you program a new chart type you can add it to the library for other developers to use. Because of the open-source feel it is expanding rapidly and often features hybrid plots that would not have been previously used.
* *d3js*, this is just as you imagine, a javascript based python library that features everything Bokeh and plotly offer but runs faster and smoother with extremely large (+10mil data points) data that is also wide data (wide data features greater than 100 columns wide with many rows of data values).

In [10]:
pip install plotly

Note: you may need to restart the kernel to use updated packages.


## Scientific Python
Python is a mathematical coding language with a tremendous ability to handle enromous amounts of data in numerical format. Scientific processes use numbers to measure just about everything in our natural environment on this planet and off world. There is a process for just about everything we would have calculated by hand or with advanced calculators in the python scientific libraries SciPy and Sklearn. SciPy's strength is the ability to read code from scientific equipment which is often in older languages like C++, Fortran, or C. In data science we mostly use it to optimize our algorithms and for advanced math calculations and statistics.

In [12]:
pip install scipy

Note: you may need to restart the kernel to use updated packages.


In [14]:
pip install statsmodels

Note: you may need to restart the kernel to use updated packages.


## Sci-kit learn
This is a machine learning library used for modeling. It features modules to classify items based on their shared properties, cluster similar items into groups to identify their properties, use the statistics of the sample to predict the next outcome with regression. It is used mostly for pre-processing data, that is to perform scaling for data or to reduce the number of variables in a machine learning application.

In [4]:
pip install -U scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.6.1-cp312-cp312-macosx_12_0_arm64.whl.metadata (31 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.6.1-cp312-cp312-macosx_12_0_arm64.whl (11.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.2/11.2 MB[0m [31m30.5 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[?25hDownloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, scikit-learn
  Attempting uninstall: threadpoolctl
    Found existing installation: threadpoolctl 2.2.0
    Uninstalling threadpoolctl-2.2.0:
      Successfully uninstalled threadpoolctl-2.2.0
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.4.2
    Uninstalling scikit-learn-1.4.2:
      Successfully uninstalled scikit-learn-1.4.2
Successfully installed scikit-learn-1.6.1 threadpoolctl-3.5.0
Note: you may need to

# Importing Libraries
Once we know what type of data we are working with and how the data should be manipulated we can begin installing libraries. Some versions of Python, depending on your coding environment, may come preloaded. The `math` module is a great example. This library comes preloaded in Python and only needs to be imported to use the methods.

There are two ways to work with libraries, to import them all at the very beginning of your analysis in a single block with all of their required preferences or dependencies, or to install them in blocks where they are needed. For example if I installed them in the block where they would be needed I would import the visualization libraries when I was beginning the data visualizations and insights section of my analysis.

Some libraries are very large and contain alot of methods which begin to take up some of your memory when you load them. It is possible to import just the module that you need from a library rather than the entire library. This is especially true if you only need one method in the library. In this case you use the Python keyword `from` to designate the library followed by the `import` method to state which method you want to load from the specified library.

In [16]:
# Let's import the math module
import math

# Use the factorial function to find the factorial of 64
# In traditional math we would write this as 64!
# We would then tediously multiply all the values, adding as we worked.
math.factorial(64)

126886932185884164103433389335161480802865516174545192198801894375214704230400000000000000

In [18]:
# The math module also holds constants
# How far do you know the value of pi?
math.pi

3.141592653589793

In [20]:
# The math module performs most all the math methods you can think of.
# the help() method will return all the possible functions for a method or module
help(math)

Help on module math:

NAME
    math

MODULE REFERENCE
    https://docs.python.org/3.12/library/math.html

    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module provides access to the mathematical functions
    defined by the C standard.

FUNCTIONS
    acos(x, /)
        Return the arc cosine (measured in radians) of x.

        The result is between 0 and pi.

    acosh(x, /)
        Return the inverse hyperbolic cosine of x.

    asin(x, /)
        Return the arc sine (measured in radians) of x.

        The result is between -pi/2 and pi/2.

    asinh(x, /)
        Return the inverse hyperbolic sine of x.

    atan(x, /)
        Return the arc tangent (measured in radians) of x.

        The re

In [22]:
# Let's find the square root of 7.567
math.sqrt(7.567)

2.7508180601413827

In [24]:
# We described the ** as raising a value to a power
# The math module expands on this and allows more options
print(34**90)
print(math.pow(34,90))

680930102416499986048414095047261177801472171608919559294191318313337145561315486168658363733784220129274771779625390673536375328631422976
6.809301024165e+137


In [None]:
# We can import just one part of a library
# This uses the keyword from to establish a library then the import to state the method
# Mount the Google Drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [26]:
# We can place all of the other libraries in a single cell
# Import the libraries
import numpy as np                  # Scientific Computing
import pandas as pd                 # Data Analysis
import matplotlib.pyplot as plt     # Plotting
import seaborn as sns               # Statistical Data Visualization

# Library to suppress warnings
import warnings
warnings.filterwarnings('ignore')