# Python Libraries

Commonly used libraries include:

- numpy
- pandas
- scikit-learn
- matplotlib
- seaborn
- jupyter

## NumPy

[NumPy](https://numpy.org/doc/stable/user/whatisnumpy.html) is the fundamental package for scientific computing in Python. At the core of the NumPy package, is the `ndarray` object. This encapsulates *n*-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance.

[Basically](https://numpy.org/doc/stable/user/quickstart.html), NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called *axes*.

- `[1, 2, 1]` has one axis
  - that axis has a length of 3
- `[[1, 0, 0], [0, 1, 2]]` has two axes
  - the 1st axis has a length of 2
  - the 2nd axis has a length of 3

In [2]:
import numpy as np

In [15]:
# one-dimentional array
a1 = np.array([1, 2, 1])
assert a1.ndim == 1                 # has one axis
assert a1.shape == (3,)             # that axis has a length of 3
assert a1.dtype.name == 'int64'     # the type of elements is int64

In [14]:
# two-dimentional array
a2 = np.array([
    [1.0, 0, 0],
    [0, 1, 2]
])
assert a2.ndim == 2                 # has two axes
assert a2.shape == (2, 3)           # the 1st axis has a length of 2, the 2nd 3
assert a2.dtype.name == 'float64'   # the type of elements is float64

## pandas

[pandas](https://pandas.pydata.org/docs/getting_started/overview.html) aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python.

- tabular data with heterogeneously-typed columns, as in an SQL table
- ordered or unordered time series data
- arbitrary matrix data with row and column labels

The two primary data structures of pandas, `Series` (1-dimensional) and `DataFrame` (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering.

To load the pandas package and start working with it, import the package as shown below. The community agreed alias for pandas is `pd`.

In [16]:
import pandas as pd

When using the *N*-dimensional array (`ndarray`s) to store 2- and 3-dimensional data, a burden is placed on the user to consider the orientation of the data set when writing functions; axes are considered more or less equivalent. In pandas, the axes are intended to lend more semantic meaning to the data; i.e., for a particular data set, there is likely to be a “right” way to orient the data. The goal, then, is to reduce the amount of mental effort required to code up data transformations in downstream functions.

The following [code](https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html) creates a `DataFrame` from a Python dictionary of lists, where the dictionary keys will be used as column headers and the values in each list as columns of the `DataFrame`.

In [17]:
titanic = pd.DataFrame({
    "Name": [
        "Braund, Mr. Owen Harris",
        "Allen, Mr. William Henry",
        "Bonnell, Miss. Elizabeth",
    ],
    "Age": [22, 35, 58],
    "Sex": ["male", "male", "female"],
})
titanic

Unnamed: 0,Name,Age,Sex
0,"Braund, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnell, Miss. Elizabeth",58,female


When you're just interested in working with the data in the column `Age`, specify the column by the label. The result is a pandas `Series`.

In [18]:
titanic["Age"]

0    22
1    35
2    58
Name: Age, dtype: int64