# Setting Up the Testing Structure in Your Project

In [None]:
project_folder/
    ├── main_code.py          # The main Python code with your functions
    ├── test_main_code.py     # The test file with functions to test main_code.py
    ├── L6_pipelines.ipynb    # Your notebook file
    └── requirements.txt      # Contains the dependencies, including pytest


In [22]:
import sys
import os
sys.path.insert(0, os.getcwd())


In [24]:
from main_code import add_numbers

# Testing if import works by using the function
print(add_numbers(3, 4))  # Expected output: 7


7


In [26]:
!pytest test_main_code.py


platform win32 -- Python 3.11.9, pytest-7.4.4, pluggy-1.0.0
rootdir: C:\Users\aliso\BIG DATA\GRA4157-main\lectures\06-guest-lecture
plugins: anyio-4.2.0
collected 1 item

test_main_code.py [32m.[0m[32m                                                      [100%][0m



# From above steps in L6_pipelines.ipynb:
## 1. You successfully imported the function add_numbers from main_code.py and got the expected output (7). 
## 2. You ran **pytest on test_main_code.py**, and all the tests passed successfully (Pytest provides a simple way to run test functions, check output, and even automate this process within CI/CD pipelines.) Run pytest in the Jupyter notebook to verify all your functions are working as intended 是否按预期运行.

# GitHub pipelines

GitHub pipelines is githubs answer to CI/CD (continuous integration and continuous deployment). In this lecture we will first look into a few examples, and then learn how to build an automated pipeline.

## A pipeline in the context of software engineering is a series of automated steps that code goes through from development to production. In simpler terms, it's a workflow that automates processes 自动化代码测试 such as code testing, building, and deploying applications.

## The pipeline is split into 2 major parts:

* CI (Continuous Integration): This involves automatically integrating code changes from multiple developers into a single software project. **The key steps in CI are code testing and integration.**
* CD (Continuous Deployment or Delivery): In Continuous Delivery, the code that has passed the integration phase is deployed to an environment (e.g., testing or staging). **In Continuous Deployment, it's also automatically deployed to production, meaning users will see these updates.**

## In the context of machine learning, CI/CD pipelines are crucial for streamlining the process of developing, testing, and deploying machine learning models.

A CI/CD pipeline - What it is typically used for:
* **Running Tests**: Automatically test code every time a commit is 'pushed' or a 'pull request' is made.
  - **When new code (e.g., feature engineering, model update, hyperparameter optimization) is pushed to the repository**, the CI pipeline runs **automated tests to check if the new changes work well without breaking anything**.
* **Building and Deploying Code**: Build your applications and deploy them to cloud services or servers.
* **Code Linting and Formatting**: Automatically check for code style and formatting issues.
* **Automation of Routine Tasks**: Trigger workflows for various tasks like **generating reports**, managing issues, or even interacting with other services via APIs.

By using CI/CD pipeline, if your feature engineering process changes, new features will automatically be tested with the latest model, minimizing human intervention and errors.

A configuration file states what actions should be performed in the pipeline. In a GitHub repository, the file is located under .github/workflows/, with a .yml (YAML)-format. The standard is to let pipelines run on ubuntu, so some knowledge about this platform may be useful. E.g. command line interaction. 

## Testing:
### Pytest provides a simple way to run test functions, check output, and even automate this process within CI/CD pipelines.

We have performed unit tests with pytest. Pytest can be included in the pipeline directly, and you can choose which files to run with pytest. Recall that we need files with functions named

```python
def test_function(x):
    ...
    ...
    assert(computed==expected)
```

and then we can run 

```bash
pytest name_of_file.py
```

to execute the tests. 

## Building and deploying code

Often your code will run as an application in the cloud. The action of installing correct dependencies (build) for the application and update the most recent code to the application (deployment) is done in the actions file. We will not cover this part here. 

## Linting

We have touched upon coding conventions and styles. Lucky for us there is an automated tool for this to make sure coders that collaborate follows a certain standard. **Tools will give you feedback on your code and often auto-correct when your code does not follow standard (linting) conventions.** 

#### Black (Black takes care of formatting, coders can focus on the logic of the code)
Black is a popular code formatter for Python that enforces a consistent style for your code. Its primary goal is to automatically format Python code to make it more readable and maintainable, adhering to a consistent set of rules.

First, install `black` in your environment using pip:

```bash
pip install black
```
Now you can check whether your code will be formatted by
```bash
python -m black my_file.py --check
```
If we omit the "--check" black will automatically update my_file.py with needed changes. To run black (with changes) for all files in the current directory do:
```bash
python -m black .
```
'flake8 filename.py'  +  'pylint filename.py' (to check your code for compliance with style guides and to identify potential bugs.)
```bash

#### Exercise
* Check whether your code adheres to black standards by installing and running black on one of your scripts
* Install a few other linting tools (e.g. flake8 and pylint). Run also these linting tools to check for differences. (flake8 my_file.py and pylint my_file.py to run)

In [41]:
!pip install black



In [43]:
!python -m black "C:/Users/aliso/BIG DATA/GRA4157-main/lectures/06-guest-lecture/main_code.py" --check

would reformat C:\Users\aliso\BIG DATA\GRA4157-main\lectures\06-guest-lecture\main_code.py

Oh no! \U0001f4a5 \U0001f494 \U0001f4a5
1 file would be reformatted.


In [45]:
!python -m black "C:/Users/aliso/BIG DATA/GRA4157-main/lectures/06-guest-lecture/main_code.py"

reformatted C:\Users\aliso\BIG DATA\GRA4157-main\lectures\06-guest-lecture\main_code.py

All done! \u2728 \U0001f370 \u2728
1 file reformatted.


In [59]:
!pip install flake8
!pip install pylint



In [61]:
!flake8 "C:/Users/aliso/BIG DATA/GRA4157-main/lectures/06-guest-lecture/main_code.py"
# This command will display any style issues or warnings directly in the notebook.

In [63]:
!pylint "C:/Users/aliso/BIG DATA/GRA4157-main/lectures/06-guest-lecture/main_code.py"
# This will give you detailed feedback on the code, including style and code quality.

************* Module main_code
main_code.py:1:0: C0114: Missing module docstring (missing-module-docstring)
main_code.py:4:0: C0116: Missing function or method docstring (missing-function-docstring)
main_code.py:4:16: C0103: Argument name "x" doesn't conform to snake_case naming style (invalid-name)
main_code.py:4:19: C0103: Argument name "y" doesn't conform to snake_case naming style (invalid-name)

------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)



# Routine tasks
Assume that we have a function that creates a set of outputs. This could be a report with a table, or trained parameters for a machine learning model. To get these values (outputs) to the application, the functions are most often run as a step in the automated pipeline. Let's say that we have a script:

In [73]:

def f(x): # function definition named 'f'
    return [2*x, x+5, 15*x] # takes one argument, x

#f(1)

The exact content of this function may change, and many developers have worked together (and separately) to update the function. This function is the basis for writing a report in table format. In the pipeline step we can for instance create a script that **saves the results from running the function to the repository**.

```python
from my_file import f
d0, d1, d2 = f(x)  # unpack its returned list into three variables: d0, d1, and d2.
with open("results_file.txt", "w") as file:
    content = f"{d0}, {d1}, {d2}"
    file.write(content)
```

We can save this file as **update_results.py** and **in the pipeline** we run:

```bash
**python update_results.py**
```

**This approach can be used in an automated pipeline to ensure that generated results are automatically saved and available to other parts of a project**

# The structure of the .yml-file

```yaml
name: Pylint

on: [push]

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.8", "3.9", "3.10"]
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v3
      with:
        python-version: ${{ matrix.python-version }}
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pylint
        pip install black
        pip install pytest
    - name: Run pytest
      run:  |
        pytest lectures/01-python-summary/lecture-scripts/test_*
    - name: Analysing the code with pylint and black
      run: |
        pylint $(git ls-files 'scripts/*.py')
        black scripts/*.py
```
        

## Exercise 

* Write a python script in the folder "scripts". You can copy one of your scripts from a weekly exercise. 

* Add, commit and push the changes to your forked repository.

* To run github actions on a forked repository you might need to go to "Actions" and enable actions on GitHub.

* Install pylint and black. Run pylint and black on the script that you added to the "scripts" folder. 

* Make necessary changes to the script such that pylint and black finishes successfully.

* Make changes to datestamp.txt on your local machine. Try to do git pull. Resolve the merge conflict
