# Build your own Python Package

© Explore Data Science Academy

## Learning Objectives
By the end of this train, you should be able to:

* Understand how Python packages work;
* Create your own Python package; and
* Gain experience using Git.

## Outline
In this train we will:

* Build a modular Python package;
* Publish this package to GitHub; and
* Share this package with others.


## Introduction

By now, you will have made use of some very popular Python packages such as `numpy` and `pandas`. You will struggle to find a Data Scientist who hasn't, at the very least, *heard* of either of these packages. One of the great features of Python is that it is an open source programing language, with an active community of developers who have helped, and continue to help, make it so user-friendly and versatile. Numpy and Pandas are just two examples of useful packages that you will come across on your data science journey. There are thousands of Python packages out there, and today we are going to learn how to build our own!

### Requirements

Before you get started, here are a few things you will need to do:

  - Install the [Atom IDE](https://atom.io/)
  - Install [Git](https://git-scm.com/downloads)
  - Sign up for a [free GitHub account](https://github.com/)
  
It is also recommended that you be familiar with GitHub and know how to use Git. So if these words mean nothing to you, then you should take time to familiarize yourself with these tools.

## Let's get started!
Once all of the necessary software has been installed, we will need to set up our project working directory. Let's work through this process step by step to create our file structure and each of the files we will need.

### File structure

Navigate to a familiar location in your computer's file system and create a new folder. You may name the folder whatever you like, but it will be easier to follow this tutorial if you name your folder **`mypackage`**. From here on, we will refer to this new folder as your project's **root folder**.

Note: the naming convention for Python packages is to use short, all-lowercase names. Underscores are permissible, but discouraged.

<img src="https://github.com/Explore-AI/Pictures/blob/master/mypackage.jpg?raw=true" alt="Python package root folder - Windows" style="width: 60%;"/>

If you are using a Mac, your folder will look similar to this:

<img src="https://github.com/Explore-AI/Pictures/blob/master/mypackage-mac.png?raw=true" alt="Python package root folder - Mac" style="width: 60%;"/>

The end goal of this tutorial is to make our package _pip installable_. For this to be possible, we will need to structure our files in a very particular way. 

### Setup files
Now let's create our files. We will do this using the Atom IDE we installed earlier. Open up Atom text then click on:    
**`File`** -> **`Add Project Folder...`**. Select the root folder we created in the previous step.

Atom has a built-in file browser that allows you to create new files and folders. You can toggle **Tree View** in Atom by using `Ctrl+\` (Windows) or `Cmd+\` (OS X). This will reveal the file browser on the left of your screen.

Our next step is to create two new folders within your project's root folder, named **`mypackage`** and **`tests`**.

<img src="https://github.com/James-Leslie/example-python-package/blob/master/images/2.0_new_folder.png?raw=true" width="400">

Within the **`mypackage`** folder, create two Python files named **`myModule.py`** and **`__init__.py`**. The **`myModule.py`** file is where we will write our function - the task we wish our package to do. The **`__init__.py`** file is used so that Python knows the directory is a module.

Within the **`tests`** folder, create one Python file named **`test.py`**. This file will be our unit test, to ensure our module is working correctly before we publish our package.

Your project directory should now look like this:   

<img src="https://github.com/James-Leslie/example-python-package/blob/master/images/2.2_file_structure.png?raw=true">

## Build your package
Now that we have our folder structure set out, we can start writing some code! We will need to do three things:

- Create our function;
- Test our function; and
- Write some documentation for our package.
  
### Create our function
The function we are going to create will perform the task of returning the top-n items in an array, in descending order. To do this, we will create an algorithm not too dissimilar to the Bubble sort algorithm.

#### Docstrings
All good programmers need to know how to write clean, concise and descriptive [docstrings](https://www.python.org/dev/peps/pep-0257/#:~:text=A%20docstring%20is%20a%20string,module%20should%20also%20have%20docstrings.) for their functions. This is where we will start. Here is an example of a well-documented function:

In [1]:
def fibonacci(n):

    """
    Calculate nth term in fibonacci sequence
    
    Args:
        n (int): nth term in fibonacci sequence to calculate
    
    Returns:
        int: nth term of fibonacci sequence,
             equal to sum of previous two terms
    
    Examples:
        >>> fibonacci(1)
        1        
        >> fibonacci(2)
        1
        >> fibonacci(3)
        2
    """

    if n <= 1:
        return n

    else:
        return fibonacci(n - 1) + fibonacci(n - 2)

We'll do the same for our function. Add the following code into the **`myModule.py`** file. Having this level of documentation will help anyone who uses your function to properly understand how the function works.

In [2]:
def top_n(items, n):
    """Return the top n items in an array, in descending order.

    Args:
        items (array): list or array-like object containing numerical values.
        n (int): number of top items to return.

    Returns:
        array: top n items, in descending order.

    Examples:
        >>> top_n([8, 3, 2, 7, 4], 3)
        [8, 7, 3]
    """

#### Function body
Now add the body of the function just below the docstring.

In [3]:
def top_n(items, n):
    """
    docstring goes here
    """

    for i in range(n):  # Keep sorting until we have the top n items
        for j in range(len(items)-1-i):

            if items[j] > items[j+1]:  # If this item is bigger than next the item..
                items[j], items[j+1] = items[j+1], items[j]  # swap the two!
                
    # Get last two items
    top_n = items[-n:]
    
    # Return in descending order
    return top_n[::-1]

In [4]:
# check function works
top_n([8, 3, 2, 7, 4], 3)

[8, 7, 4]

This is what it should look like in Atom. Do you understand what this function is doing?


<img src="https://github.com/James-Leslie/example-python-package/blob/master/images/3_function_body.png?raw=true" alt="View of our code in Atom" style="width: 65%;"/>

Now add the following to the **`__init__.py`** file. Ensure that you save your files - you have been spoiled by Jupyter Notebook's autosave functionality!

<br>

```python
from . import myModule
```

### Testing
You should always write some tests for every function you create. Can you think why this is a good practice?

In **`test.py`**, add the following:

In [None]:
from mypackage import myModule

def test_top_n():
    """
    make sure top_n works correctly
    """
    
    assert myModule.top_n([8, 3, 2, 7, 4], 3) == [8, 7, 4], 'incorrect'
    assert myModule.top_n([10, 1, 12, 9, 2], 2) == [12, 10], 'incorrect'
    assert myModule.top_n([1, 2, 3, 4, 5], 5) == [5, 4, 3, 2, 1], 'incorrect'

### Supporting files

Next, we will need to create another file named **`setup.py`** which describes your package. This setup file is what makes your package installable. In your package's root directory, create **`setup.py`** and add the following code. Replace the 'url', 'author', and 'author_email' value fields with what is relevant to your package.

In [None]:
from setuptools import setup, find_packages

setup(
    name='mypackage',
    version='0.1',
    packages=find_packages(exclude=['tests*']),
    license='MIT',
    description='EDSA example python package',
    long_description=open('README.md').read(),
    install_requires=['numpy'],
    url='https://github.com/<username>/<package-name>',
    author='<Your Name>',
    author_email='<Your Email>'
)

Consult the table below for some additional information on the parameters in `setup.py`.

| Parameter | Comments |
|---|---|
| name | The name package managers will use for your project, like `numpy` or `pandas` |
| version | The current version number of your project |
| license | Name of the [license](https://opensource.org/licenses/) you chose |
| description | One-sentence description of your package |
| install_requires | List of all other packages this package depends on; package managers will install these automatically as needed |

Lastly, create a **`README.md`** file in your project's root folder, and add anything you'd like to describe your package in more detail. Go to [this website](https://www.makeareadme.com/) for some helpful info on how to make a proper README file.

<img src="https://github.com/James-Leslie/example-python-package/blob/master/images/4.0_readme.png?raw=true" alt="Completed code in Atom" style="width: 65%;"/>


## Wrap It Up!
You are now ready to ship your package. Let's package it up and distribute it on GitHub.

### Package it locally
When you are ready, run the following in the command line:

```bash
python setup.py sdist
```

You should see a new folder named **`dist`** that has been created in your project's root directory.

_NOTE:_ You should also see a folder named **`mypackage.egg-info`** that has been created.   

This latter file doesn't need to be included, so you can add it to a **`.gitignore`** file in the root folder of your project as shown in the image below:

<img src="https://github.com/James-Leslie/example-python-package/blob/master/images/4.1_gitignore.png?raw=true" alt="View of our code in Atom" style="width: 65%;"/>

### Distribute to GitHub
We now want to publish our package so that anyone else can download and use it! This is done by publishing your package to GitHub (there are other ways to publish a python package, can you find out how?).

#### Initialize local Git repository
Using any terminal, navigate to your project's root folder and issue the following commands, one line at a time.

```bash
git init
git add .
git commit -m "First commit"
```

#### Create remote repository
Log into GitHub and create a new repository. The following image depicts this process, where the GitHub user 'James-Leslie' is creating a new repository. Ensure that your repository is marked as Public.

<img src="https://github.com/James-Leslie/example-python-package/blob/master/images/4.2_new_repo.png?raw=true" width="700">


#### Push to GitHub
Copy the URL for the remote repository and issue the following commands. The image below shows where you can obtain the URL.

```bash
git remote add origin <remoteURL>
git push origin master
```

<img src="https://github.com/James-Leslie/example-python-package/blob/master/images/5_new_repo.png?raw=true" width="700">

### Install from GitHub
You can now install your package onto any computer (with internet access)!  

Issue the command below to install your package from GitHub - make sure to replace `your-name` and `your-repo` with the appropriate text.  

```
pip install git+https://github.com/your-name/your-repo.git
```

If you need to install a later version of your package, then use:  

```
pip install --upgrade git+https://github.com/your-name/your-repo.git
```

## Maintaining your package
You now have a version 0.1 of your first Python package! With this being done, you're in a position to make improvements and expand on its scope.

### Package development workflow
Follow these steps when making changes to your package:

1. Make changes locally
2. Push changes to GitHub
3. Install updated version

We outline these steps briefly below:
    
#### 1) Make changes locally
Your package consists of a number of interdependent files. It is important to keep all of these dependencies in check.   

A likely workflow will look something like this:

- add new functions, or improve existing functions
- update `test.py` if needed
- update `__init__.py` if needed
- update `setup.py` if needed (make sure to update the version number)
    
Once you have tested your functions, and you are happy to push the new version, run the same setup command as before:   
```
python setup.py sdist
```

#### 2) Push changes to GitHub
When you are ready to publish your updated package, follow the commands below:

```
git status
git add .
git commit -m 'make sure to include an appropriate commit message'
git push
```

#### 3) Install updated version
The last step is to install your updated version, using the command below: 

```
pip install --upgrade git+https://github.com/your-name/your-repo.git
```

## Conclusion

You have now built a modular Python package and published this package to GitHub. You should now understand how Python packages work and have gained more experience using Git. Storing your projects on GitHub is a great way to share your portfolio of work with potential employers. For an example of a working package, check out [this repository](https://github.com/James-Leslie/example-python-package). 

If you are up for a challenge, follow Step 5 and 6 in this [article](https://towardsdatascience.com/how-to-build-your-first-python-package-6a00b02635c9) to deploy your package to PyPI.

## Appendix

[Packaging Python Projects](https://packaging.python.org/tutorials/packaging-projects/)

[How to Build Your First Python Package](https://towardsdatascience.com/how-to-build-your-first-python-package-6a00b02635c9)