# Python Charmers 

## Python Fundamentals Lesson 2: Packages

### Lesson Overview
- **Objective:** This lesson gets you started with the basics of installing and using packages in Python.
- **Source materials:** Original content
- **Prerequisites:** [Lesson 1: Variables](./fundamentals-01-variables.ipynb)
- **Duration:** 15 mins

## Python Packages

A package in Python is like a toolbox that contains a variety of tools (called functions and modules) which you can use to perform specific tasks more easily and efficiently. 

These packages are created by other developers and can be installed and used in your own Python programs.

For data analysts, packages are particularly useful because they provide specialized tools and functions that are tailored for data analysis tasks. 

Let's take the example of the **pandas** library/package:
- **Data Handling and Manipulation**: Pandas makes it easy to handle and manipulate data. It provides functions to read data from various file formats (like CSV, Excel, etc.), clean data, and transform it into a useful format.
- **Data Analysis**: With pandas, you can perform complex data analysis tasks with just a few lines of code. It offers functionalities for grouping data, calculating statistics, and performing aggregations.
- **Efficiency**: Writing the code for all these tasks from scratch would be time-consuming and error-prone. Pandas provides a tested and optimized set of functions that can save a lot of time and effort.

### Not all packages are installed by default

Installing a package using pip.

`pip`is the package installer for Python. You can use it to install packages from the Python Package Index (PyPI). 

Basic command: `pip install package_name`

In [None]:
# Exercise 1: Install Pandas via Terminal

# Open a Terminal window
# - File > New > Terminal

# Enter the following command:
# pip install pandas

# Verify the code below runs successfully and you see some data.
# You may need to re-open this notebook for the install to work. 

In [2]:
import pandas as pd
data = pd.read_csv('../data/2019_Yellow_Taxi_Trip_Data.csv')
print(data.head())

   vendorid     tpep_pickup_datetime    tpep_dropoff_datetime  \
0         2  2019-10-23T16:39:42.000  2019-10-23T17:14:10.000   
1         1  2019-10-23T16:32:08.000  2019-10-23T16:45:26.000   
2         2  2019-10-23T16:08:44.000  2019-10-23T16:21:11.000   
3         2  2019-10-23T16:22:44.000  2019-10-23T16:43:26.000   
4         2  2019-10-23T16:45:11.000  2019-10-23T16:58:49.000   

   passenger_count  trip_distance  ratecodeid store_and_fwd_flag  \
0                1           7.93           1                  N   
1                1           2.00           1                  N   
2                1           1.36           1                  N   
3                1           1.00           1                  N   
4                1           1.96           1                  N   

   pulocationid  dolocationid  payment_type  fare_amount  extra  mta_tax  \
0           138           170             1         29.5    1.0      0.5   
1            11            26             1     

Alternatively you can install packages from the notebook

In [4]:
# Exercise 2: Install Numpy via Notebook

# The script below will allow you to execute a terminal command here in the notebook
#  - sys is part of Python's standard library and provides access to variables used or maintained by the Python interpreter
#  - sys.executable holds the path to the Python interpreter that your Jupyter notebook is using
#  - '-m' is a flag is used with the Python interpreter to run a 'pip' as a script, i.e to install the package

import sys
!{sys.executable} -m pip install numpy

## Package Versions

Packages in Python, as in most software, have different versions for several important reasons:

- Feature Updates and Improvements
- Bug Fixes
- Security Updates, and many more

Different versions allow users to choose which version best fits their needs. For instance, a user may stick with an older, stable release for a critical application, while another might prefer the latest version for access to the newest features. Managing these versions and understanding the implications of updating (or not updating) is an important aspect of working with Python packages.

### Check a Package's Version

Python packages often have their version information accessible via a __version__ attribute.
To check the version of a package, you first need to import it and then access this attribute.

In [6]:
# Check Numpy's version
import numpy
print(numpy.__version__)

1.26.2


### Downgrading a Package

Sometimes, you might need to downgrade a package to ensure compatibility with other packages or specific codebases. To downgrade, specify the package name along with the desired version. For example:

pip install package_name==1.23.5

In [None]:
# Exercise: Downgrade numpy to version 1.18.5
# Using:
# - Terminal: pip install numpy==1.18.5
# - Notebook: 

In [None]:
import sys
!{sys.executable} -m pip install numpy==1.18.
# Remember to restart the Jupyter kernel after upgrading to use the updated package.

### Upgrading a Package

Upgrading a package ensures that you have the latest features and bug fixes. Use pip with the --upgrade flag to update a package. For example: 

pip install --upgrade package_name

In [None]:
# Exercise: Upgrade Numpy to latest version
# Either in Terminal or in the script below

import sys
!{sys.executable} -m [insert your code here]

## Mini Project: Dependency Detective

When you install a Python package, such as pandas, it may not work in isolation. It often relies on other packages (dependencies) for its operation. These dependencies are automatically installed along with the package you install. Understanding these dependencies is crucial for:

- Compatibility
- Debugging
- Optimization

We can see dependencies using the `requires` function from the `importlib.metadata` package.

Pandas and numpy are two common packages for data analysis, can you create a list of all the dependencies? 

In [1]:
# Import the `requires` function from the `importlib.metadata` package

# Create two variables that contain lists of dependencies for pandas and numpy

# Combine the dependencies lists

# Convert to a set to remove duplicates and then back to a list

# Print combined list of unique dependencies


## Additional Resources
- 📰 **PyPi** - Python Package Index https://pypi.org/
- 📰 **PyPi** Installing Packages https://packaging.python.org/en/latest/tutorials/installing-packages/
- 📺 **Core Electronics** Python Workshop - Installing Packages https://www.youtube.com/watch?v=SrX5yo4KKGM

## Summary

Packages in Python allow you to perform a whole range of tasks using a few premade functions. The knowledge gained here will be repeated as we build scripts in Python. 

Remember, the world of Python packages is ever-evolving, with new packages and updates being released regularly. Staying informed of new package release to increase your capabilities with Python.

## Next Lesson

**[Lesson 3: Data Wrangling](./fundamentals-03-data_wrangling-p1.ipynb)** 
In this lesson we'll be working with pandas and numpy to read, write and manipulate datasets.