# Packages, Modules, Methods, and Functions
> The Python source distribution has long maintained the philosophy of "batteries included" -- having a rich and versatile standard library which is immediately available, without making the user download separate packages. This gives the Python language a head start in many projects.
>
> \- PEP 206

## Applied Review
### Python and Jupyter Overview
- We're working with Python through Jupyter, the most common IDE for data science.

### Fundamentals
- Python's common *atomic*, or basic, data types are
    - Integers
    - Floats (decimals)
    - Strings
    - Booleans
- These simple types can be combined to form more complex types, including:
    - Lists: Ordered collections
    - Dictionaries: Key-value pairs
    - DataFrames: Tabular datasets

## Packages (aka Modules)
So far we've seen several data types that Python offers out-of-the-box.
However, to keep things organized, some Python functionality is stored in standalone *packages*, or libraries of code.
The word "module" is generally synonymous with package; you will hear both in discussions of Python.

For example, functionality related to the operating system -- such as creating files and directories -- is stored in a package called `os`.
To use the tools in `os`, we *import* the package.

In [1]:
import os

Once we import it, we gain access to everything inside.
With Jupyter's autocomplete, we can view what's available.

In [None]:
# Move your cursor the end of the below line and press tab.
os.

Some packages, like `os`, are bundled with every Python install; downloading Python guarantees you'll have these packages.
Collectively, this group of packages is known as the *standard library*.

Other packages must be downloaded separately, either because
- they aren't sufficiently popular to merit inclusion in the standard library
- *or* they change too quickly for the maintainers of Python to keep up

The DataFrame type that we saw earlier is part of the `pandas` package (short for *Panel Data*), one such package.
Since pandas is specific to data science and is still rapidly evolving, it is not part of the standard library.

We can download packages like pandas from the internet using a website called PyPI, the *Python Package Index*.
Fortunately, since we are using Binder today, that has been handled for us and pandas is already installed.

It's possible to import packages under an *alias*, or a nickname.
The community has adopted certain conventions for aliases for common packages;
while following them isn't mandatory, it's highly recommended, as it makes your code easier for others to understand.

pandas is conventionally imported under the alias `pd`.

In [2]:
import pandas as pd

In [3]:
# Importing pandas has given us access to the DataFrame, accessible as pd.DataFrame
pd.DataFrame

pandas.core.frame.DataFrame

<font style="color:#008;">
    <strong>Question</strong>:<br><em>What is the type of `pd`? Guess before you run the code below.</em>
</font>

In [4]:
type(pd)

module

Third-party packages unlock a huge range of functionality that isn't available in native Python; much of Python's data science capabilities come from a handful of packages outside the standard library:
- pandas
- numpy (numerical computing)
- scikit-learn (modeling)
- scipy (scientific computing)
- matplotlib (graphing)

We won't have time to touch on most of these in this training, but if you're interested in one, google it!

### Your Turn
1. Import the `numpy` library, listed above. Give it the alias "np".
2. Using autocomplete, determine what variable inside the numpy library starts with "eig". *Hint: remember you'll need to preface the variable name with the package alias, e.g. `np.eig`*

## Functions
On several occasions, we've seen parentheses used to produce a result.
```python
# Get the type of pd.
type(pd)

# Get the first few rows of the planes data.
planes.head()

# Read in the planes.csv file.
pd.read_csv('../data/planes.csv')
```

Expressions using parentheses like this are called "function calls".
The name to the left of the parens (`type`, `planes`, `pd`) is called the **function**, and any variables within the parens are called function arguments, or simply **arguments**.

Functions are wrappers for chunks of Python code that is stored in a shorter name.
For example, the `read_csv` function is actually a call to a block of Python code that reads in a data file, turns it into a DataFrame, and *returns* it to the user.

The key idea of functions is **they take inputs and produce outputs**;
usually you don't need to know anything about *how* they do it.

Functions are integral to using Python, because it's much more efficient to use pre-written code than to always write your own.
If you ever do want to write your own function -- perhaps to share with others, or to make it easier to reuse your work -- it's fairly simple to do so, but beyond the scope of this training.

## Objects and Methods
Up to this point, we've referred to *variables* and *values*.
Variables are the names we give to values.

To be more precise and pythonic, the values stored in variables are **objects**.
Everything in Python is an object: integers, strings, DataFrames, and even modules (like the `pd` variable above).

All objects support **dot-notation**: using a period to access data *within* the object.
For example, we saw this syntax above:
```python
pd.DataFrame
```
This refers to the `DataFrame` attribute of the `pd` object.

In the Fundamentals notebook, we created a DataFrame called `planes` and used the `head` attribute to view the first 5 rows of the data.
```python
planes.head()
```
As you can see, attributes can be functions (note the parens after `head`);
we call such attributes **methods**.

Methods are just like other functions, except that their relationship to their parent object allows them to use its data.
For example, `head` access the first rows of its parent object (a DataFrame) and returns them.

### Functions, Objects, and Methods in the Context of DataFrames
DataFrames are a type of Python object, so let's use them to explore the new Python features we've learned.

Using the `read_csv` function from pandas to read in a DataFrame.

In [20]:
df = pd.read_csv('../data/airlines.csv')

Using the `type` function to determine the type of `df`.

In [9]:
type(df)

pandas.core.frame.DataFrame

Using the `head` method of the DataFrame to view some of its rows.

In [10]:
df.head()

Unnamed: 0,carrier,name
0,9E,Endeavor Air Inc.
1,AA,American Airlines Inc.
2,AS,Alaska Airlines Inc.
3,B6,JetBlue Airways
4,DL,Delta Air Lines Inc.


Examining the `columns` attribute of the DataFrame to see the names of its columns.

In [12]:
df.columns

Index(['carrier', 'name'], dtype='object')

Inspecting the `shape` attribute to find the *dimensions* (rows and columns) of the DataFrame.

In [16]:
df.shape

(16, 2)

Calling the `describe` method to get a summary of the data in the DataFrame.

In [18]:
df.describe()

Unnamed: 0,carrier,name
count,16,16
unique,16,16
top,WN,Alaska Airlines Inc.
freq,1,1


Now let's combine them: using the `type` function to determine what `df.describe` holds.

In [19]:
type(df.describe)

method

<font style="color:#008;">
    <strong>Question</strong>:<br><em>Does this result make sense? What would happen if you added parens? i.e. <code>type(df.describe())</code></em>
</font>

### Your Turn
Spend some time using autocomplete to explore the methods and attributes of the `df` object we used above.
Remember from the Jupyter lesson that you can use a question mark to see the documentation for a function or method (e.g. `df.describe?`).

# Questions

Are there any questions before we move on?