# Testing

We've slowly been moving parts of our code into files outside of notebooks... but *why*?

- **Reuse** – we could copy `my_module.py` and use it in another project if we had need for the functions in there.

- **Testability** – one nice aspect of free-standing Python scripts is that we can write tests for them, checking that the functions inside are reliable and bug-free.

## The Value of Testing

- By running your code on example inputs (for which you know the right output), you can be more confident that it will do what you expect

- Since you may reuse code in other projects, it's smart to test on not just the data for the current project, but any inputs that your code might reasonably have to deal with.

At this point, our directory setup is going to become very important, so let's take a quick detour to talk about it.

I'm going to be working with a project that looks something like this:
```
advanced-python-datasci/
├── data/
│   ├── adult-census.csv
│   ├── ames.csv
│   ├── ames_raw.csv
│   └── planes.csv
└── notebooks/
    ├── 01-git.ipynb
    ├── 02-explore_data.ipynb
    ├── 03-first_model.ipynb
    ├── 04-modular_code.ipynb
    ├── 05-feat_eng.ipynb
    ├── 06-model_eval.ipynb
    ├── 07-modularity-pt2.ipynb
    ├── 08-testing.ipynb
    ├── 09-ml_lifecycle_mgt.ipynb
    └── my_module.py
```

What's important:
- At the top level, we have folders for `data` and `notebooks`
- `my_module.py` is in our notebooks folder

Take a few minutes to make sure your project repository is organized similarly. This will make a big difference in this section!

```
advanced-python-datasci/
├── data/
│   ├── adult-census.csv
│   ├── ames.csv
│   ├── ames_raw.csv
│   └── planes.csv
└── notebooks/
    ├── 01-git.ipynb
    ├── 02-explore_data.ipynb
    ├── 03-first_model.ipynb
    ├── 04-modular_code.ipynb
    ├── 05-feat_eng.ipynb
    ├── 06-model_eval.ipynb
    ├── 07-modularity-pt2.ipynb
    ├── 08-testing.ipynb
    ├── 09-ml_lifecycle_mgt.ipynb
    └── my_module.py
```

## A Minimal Test

- The easiest way to write a test is in a fresh Python script

- Now that our project is organized, we can just create a new file in the `notebooks/` folder, called `tests.py`

    - Remember that we can do this in Jupyter with File > New > Text File
<br><br>    
- Be sure this file appears in your notebooks folder!
    - It's very important that it's in the same place as `my_module.py`

Add the following code to your script:

In [9]:
import my_module

def test_invocation():
    features, target = my_module.get_features_and_target(
        csv_file='../data/adult-census.csv',
        target_col='class'
    )

<div class="admonition note alert alert-info">
    <b><p class="first admonition-title" style="font-weight: bold;">Discussion</p></b>
    If we were to run this code on its own with <code>python tests.py</code>, what would happen?
</div>

## Running Our Test
- We're going to invoke our test with **pytest**, a tool we'll discuss more shortly
- Open a terminal session (in Jupyter, File > New > Terminal)
    - Things in the terminal are a bit different here in Windows vs Mac/Linux, so we'll try to help how we can...

Run the below command in a notebook to find out what folder you're currently working in:

In [11]:
import os
os.getcwd()

'/Users/eswan18/Teaching/advanced-python-datasci/notebooks'

Copy that result (including the quotes) and in your terminal, paste it after the `cd` command.

So in my terminal, I would run:
```bash
cd '/Users/eswan18/Teaching/advanced-python-datasci/notebooks'
```

Now...
- Windows users: run `dir`
- Mac/Linux users: run `ls`

This lists the contents of the folder you're currently inside.
You should see `my_module.py` and `tests.py` among the output.

![ls](images/ls.png)

Now, we're almost ready to run our test.
The only thing left is to set up our terminal so that it's using the same Conda environment as our notebooks -- because `pytest` is installed in that environment.
- If you took the intermediate class with us, we discussed Conda and environments in more detail then.

In the terminal, run
```bash
conda activate uc-python
```

This should add a "uc-python" prefix to your terminal prompt:
![ls](images/prefix-prompt.png)

Note that your prompt will look quite a bit different from mine; all that matters is the folder name and the "uc-python" prefix.

Now we're ready to run our test!

In your terminal, type:
```bash
pytest tests.py
```

You should see some output appear.
The last line should look something like:
```
============ 1 passed in 0.74s ============
```

This means that 1 (test) passed and 0 tests failed, and the whole process took 0.74 seconds.

## Pytest

- Pytest is an automated tool for running sets of tests
  - sets of tests are often called test "suites"
- It expects your tests to be in their own files, and each test needs to be a function
  - The name of each function must start with `test_`, so pytest knows it's a test and not just a regular function.

Let's look back at our simple test:

In [None]:
import my_module

def test_invocation():
    features, target = my_module.get_features_and_target(
        csv_file='../data/adult-census.csv',
        target_col='class'
    )

Note that our function starts with `def test_`, and pytest is smart enough to run it.

What happens if a test fails?
Let's add a bad test just to see.

Add this function to `tests.py`, below `test_invocation`.

In [12]:
def test_without_args():
    # A test we know will fail because we don't provide arguments
    # to the function.
    features, target = my_module.get_features_and_target()

Save the file, and then rerun `pytest tests.py` in your terminal.

============== 1 failed, 1 passed in 0.83s ==============

Our original test still passes, but this one fails!

Above this line, pytest reports exactly what happened that caused it to fail.
We got an error:
```text
    def test_without_args():
        # A test we know will fail because we don't provide arguments
        # to the function.
>       features, target = my_module.get_features_and_target()

E       TypeError: get_features_and_target() missing 2 required positional arguments: 'csv_file' and 'target_col'

tests.py:12: TypeError
```

## What does it mean to "fail"?

- If a test function encounters any kind of unexpected error, that counts as a failure to pytest
- Any test that runs without error "passes"

Let's remove our `test_without_args` test -- it's not something we actually want to verify about our code.

However, one thing we *do* want to check is that the features and target that are returned from our function are a pandas DataFrame and Series, respectively. Let's add a test for that in its place...

In [14]:
import pandas as pd # You may want to move this import to the top of the file.

def test_return_types():
    features, target = my_module.get_features_and_target(
        csv_file='../data/adult-census.csv',
        target_col='class'
    )
    assert isinstance(features, pd.DataFrame)
    assert isinstance(target, pd.Series)

The we can rerun `pytest tests.py`
```text
=========== 2 passed in 0.88s ===========
```

Nice! It looks like our function does indeed return a DataFrame and a Series.

## Assert