# Packages, Modules, & Libraries

> The Python source distribution has long maintained the philosophy of "batteries included" -- having a rich and versatile standard library which is immediately available, without making the user download separate packages. This gives the Python language a head start in many projects.
>
> \- PEP 206

## Applied Review

### Python and Jupyter Overview

- We're working with Python through Jupyter, the most common IDE for data science.

### Fundamentals

- Python's common *atomic*, or basic, data types are:
    - Integers
    - Floats (decimals)
    - Strings
    - Booleans

- These simple types can be combined to form more complex types, including:
    - Lists: Ordered collections
    - Dictionaries: Key-value pairs
    - DataFrames: Tabular datasets

## Packages (aka *Modules*)

So far we've seen several data types that Python offers out-of-the-box.
However, to keep things organized, some Python functionality is stored in standalone *packages*, or libraries of code.
The word "module" is generally synonymous with "package," and "library"; you will hear all three in discussions of Python.

If you want more clear definitions, the three can thought of this way:

- A **package** refers to source code that is bundled in a way that can be distributed to users
  - You can install packages using `pip install package_name`
- A **library** refers to a package that has been installed to a centralized location on an operating system
  - You can view your libraries using `pip list` or `conda list`
- A **module** refers to code that you can import from outside your current script or notebook
  - To use code from a module, you'll need to `import module_name`

For example, functionality related to the operating system -- such as creating files and folders -- is stored in a package called `os`.
To use the tools in `os`, we *import* the package.

In [1]:
import os

Once we import it, we gain access to everything inside.
With Jupyter's autocomplete, we can view what's available.

In [None]:
# Move your cursor the end of the below line and press tab.
os.

Some packages, like `os`, are bundled with every Python install; downloading Python guarantees you'll have these packages.
Collectively, this group of packages is known as the *standard library*.

Other packages must be downloaded separately, either because
- they aren't sufficiently popular to merit inclusion in the standard library
- *or* they change too quickly for the maintainers of Python to keep up

One very commonly-used data science package is called `pandas` (short for *Panel Data*).
Since `pandas` is specific to data science and is still rapidly evolving, it is not part of the standard library.

**Note:** We'll cover `pandas` in more detail in later modules.

We can download packages like pandas from the internet using a website called PyPI, the *Python Package Index*.
Fortunately, since we are using a pre-built conda environment today, that has been handled for us and pandas is already installed.

It's possible to import packages under an *alias*, or a nickname.
The community has adopted certain conventions for aliases for common packages;
while following them isn't mandatory, it's highly recommended, as it makes your code easier for others to understand.

`pandas` is conventionally imported under the alias `pd`.

In [2]:
import pandas as pd

Importing `pandas` has given us access to the `DataFrame`, accessible as `pd.DataFrame`

In [3]:
pd.DataFrame

pandas.core.frame.DataFrame

*Question:*

_What is the type of `pd`? Guess before you run the code below._

In [4]:
type(pd)

module

Third-party packages unlock a huge range of functionality that isn't available in native Python; much of Python's data science capabilities come from a handful of packages outside the standard library:

- pandas
- numpy (numerical computing)
- scikit-learn (modeling)
- scipy (scientific computing)
- matplotlib (graphing)

We won't have time to touch on most of these in this training, but if you're interested in one, google it!

### Your Turn

1. Import the `numpy` library, listed above. Give it the alias "np".
2. Using autocomplete, determine what variable or function inside the numpy library starts with "asco". *Hint: remember you'll need to preface the variable name with the package alias, e.g. `np.asco`*

### Dot Notation with Packages

We've seen it a few times already, but now it's time to discuss it explicitly:
things inside packages can be accessed with *dot-notation*.

Dot notation looks like this:
```python
pd.Series
```

or
```python
import numpy as np
np.array
```

You can read this is "the `array` variable, within the Numpy library".

Packages can contain pretty much anything that's legal in Python;
if it's code, it can be in a package.

This flexibility is part of the reason that Python's package ecosystem is so expansive and powerful.

### Objects and Dot Notation

Dot-notation has another use -- accessing things inside of *objects*.

What's an object? Basically, a variable that contains other data or functionality inside of it that is exposed to users.

For example, `DataFrames` are objects.

**Note:** We'll cover `pandas` and `DataFrames` in far more detail in later modules.

In [6]:
df = pd.DataFrame({'first_name': ['Ethan', 'Brad'], 'last_name': ['Swan', 'Boehmke']})

In [7]:
df

Unnamed: 0,first_name,last_name
0,Ethan,Swan
1,Brad,Boehmke


In [8]:
df.shape

(2, 2)

In [9]:
df.describe()

Unnamed: 0,first_name,last_name
count,2,2
unique,2,2
top,Brad,Boehmke
freq,1,1


You can see that DataFrames have a `shape` variable and a `describe` function inside of them, both accessible through dot notation.

**Note:** Variables inside an object are often called _attributes_ and functions inside objects are called _methods_.

### On Consistency and Language Design

One of the great things about Python is that its creators really cared about internal consistency.

What that means to us, as users, is that syntax is consistent and predictable -- even across different uses that would appear to be different at first.

Dot notation reveals something kind of cool about Python: packages are just like other objects, and the variables inside them are just attributes and methods!

This standardization across packages and objects helps us remember a single, intuitive syntax that works for many different things.

### Your Turn

Using the `math` library:

1. Find a function that will compute the square root of $14 \times 0.51$
2. Find a function that will compute $3.25^{2.784}$

# Questions

Are there any questions before we move on?