# Lesson 31: Continuous Integration (CI) for AI Engineering

In this notebook, we'll practice the CI essentials covered in Lesson 31. You'll run formatting checks, linting, and tests to see how CI tools maintain code quality.

**Learning Objectives:**

- Understand Brown's CI configuration files
- Practice running formatting and linting checks with Ruff
- Learn to fix code quality issues automatically
- Run unit tests with mocked LLM responses

## 1. Setup

First, let's navigate to the Brown writing agent directory.


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
%cd ../writing_workflow

/Users/fabio/Desktop/course-ai-agents/lessons/writing_workflow


## 2. Viewing Brown's CI Configuration

Let's examine Brown's actual CI configuration files to understand how CI is set up.

### 2.1 Pre-commit Configuration

The `.pre-commit-config.yaml` file defines Git hooks that run automatically before each commit. These hooks catch issues immediately in your local development environment.

Brown's pre-commit configuration includes three types of hooks:
1. **validate-pyproject** - Validates that `pyproject.toml` is structurally correct
2. **prettier** - Formats YAML and JSON configuration files consistently
3. **ruff-check** and **ruff-format** - Lints and formats Python code

In [None]:
# Show the content of the file
!cat .pre-commit-config.yaml

fail_fast: false

repos:
  - repo: https://github.com/abravalheri/validate-pyproject
    rev: v0.24.1
    hooks:
      - id: validate-pyproject

  - repo: https://github.com/pre-commit/mirrors-prettier
    rev: v3.1.0
    hooks:
      - id: prettier
        types_or: [yaml, json5]

  - repo: https://github.com/astral-sh/ruff-pre-commit
    # Ruff version.
    rev: v0.12.1
    hooks:
      # Run the linter.
      - id: ruff-check
        args: [--fix, --exit-non-zero-on-fix]
      # Run the formatter.
      - id: ruff-format


### 2.2 Ruff Configuration

The `pyproject.toml` file contains Ruff's configuration in the `[tool.ruff]` section. This defines:
- **target-version**: Which Python version to target (py312 for Python 3.12)
- **line-length**: Maximum line length (140 characters for modern screens)
- **select rules**: Which linting rules to enable (F=Pyflakes, E=pycodestyle, I=isort)
- **known-first-party**: How to group imports correctly

In [None]:
# Show the content of the file related to ruff
!grep -A 20 "\[tool.ruff\]" pyproject.toml

[tool.ruff]
target-version = "py312"
line-length = 140

[tool.ruff.lint]
select = [
    "F",    # Pyflakes
    "E",    # pycodestyle errors
    "I",    # isort
]

[tool.ruff.lint.isort]
known-first-party = ["src", "tests"]

[tool.pytest.ini_options]
pythonpath = ["src"]


### 2.3 Makefile QA Targets

The Makefile provides convenient shortcuts for running CI commands. Instead of typing long `uv run ruff format --check src/ tests/ scripts/` commands, you can simply run `make format-check`.

The Makefile defines:
- **QA_FOLDERS** - Which directories to check (src/, tests/, scripts/)
- **format-check/format-fix** - Formatting commands
- **lint-check/lint-fix** - Linting commands
- **tests** - Test suite with the correct configuration
- **pre-commit** - Manual pre-commit hook execution

In [None]:
# Show the commands in the Makefile related to QA
!sed -n '/# --- Tests & QA ---/,$p' Makefile | tail -n +2


tests: # Run tests.
	CONFIG_FILE=configs/debug.yaml uv run pytest

pre-commit: # Run pre-commit hooks.
	uv run pre-commit run --all-files

format-fix: # Auto-format Python code using ruff formatter.
	uv run ruff format $(QA_FOLDERS)

lint-fix: # Auto-fix linting issues using ruff linter.
	uv run ruff check --fix $(QA_FOLDERS)

format-check: # Check code formatting without making changes using ruff formatter.
	uv run ruff format --check $(QA_FOLDERS) 

lint-check: # Check code for linting issues without fixing them using ruff linter.
	uv run ruff check $(QA_FOLDERS)


## 3. Pre-commit Hooks (Local Enforcement)

Pre-commit hooks run automatically before each commit, catching issues before they enter version control. Let's run them manually:


In [21]:
!uv run pre-commit run --files ./**/*

Validate pyproject.toml..............................(no files to check)[46;30mSkipped[m
prettier.................................................................[42mPassed[m
ruff check...............................................................[42mPassed[m
ruff format..............................................................[42mPassed[m


These hooks will:
1. Validate your `pyproject.toml` structure
2. Format all YAML/JSON files with prettier
3. Lint Python code with ruff-check (and auto-fix issues)
4. Format Python code with ruff-format

If any hook fails, you'll see the error, fix it, re-stage the files, and commit again. This tight feedback loop keeps code quality high.


## 4. Running Formatting Checks

Now let's practice using Ruff's formatter. Instead of running it on Brown's existing code (which is already formatted), we'll create a simple Python file with formatting issues and fix them.

### 4.1 Create a Test File with Formatting Issues

In [24]:
%%bash

# Create a Python file with various formatting issues
cat > test_formatting.py << 'EOF'
# This file has formatting issues
def  badly_formatted_function(x,y,z):
    result=x+y+z
    my_list=[1,2,3,4,5,6,7,8,9,10]
    my_dict={"key1":"value1","key2":"value2","key3":"value3"}
    if result>10:
        print("Result is greater than 10")
    else:
        print("Result is 10 or less")
    return result

class   BadlyFormattedClass:
    def __init__(self,name,age):
        self.name=name
        self.age=age
    def get_info(self):
        return f"{self.name} is {self.age} years old"
EOF

echo "Created test_formatting.py"

Created test_formatting.py


### 4.2 Check Formatting (Without Fixing)

Let's check if the file has formatting issues without modifying it:

In [25]:
!uv run ruff format --check test_formatting.py

Would reformat: [1mtest_formatting.py[0m
1 file would be reformatted


You'll see that Ruff reports the file would be reformatted. The `--check` flag means Ruff only reports issues without changing the file.

### 4.3 Auto-fix Formatting Issues

Now let's fix all the formatting issues automatically:

In [26]:
!uv run ruff format test_formatting.py

1 file reformatted


Ruff will reformat the file to follow consistent style rules. Let's see the result:

In [27]:
cat test_formatting.py

# This file has formatting issues
def badly_formatted_function(x, y, z):
    result = x + y + z
    my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    my_dict = {"key1": "value1", "key2": "value2", "key3": "value3"}
    if result > 10:
        print("Result is greater than 10")
    else:
        print("Result is 10 or less")
    return result


class BadlyFormattedClass:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def get_info(self):
        return f"{self.name} is {self.age} years old"


Notice how Ruff has:
- Fixed spacing around operators (`x+y+z` → `x + y + z`)
- Added proper spacing in function signatures
- Formatted lists and dictionaries consistently
- Fixed class definition spacing

Let's remove the file now:

In [28]:
!rm test_formatting.py

## 5. Running Linting Checks

Linting goes beyond formatting—it checks for bugs, code quality issues, and best practices. Let's create a file with linting issues and fix them.

### 5.1 Create a Test File with Linting Issues

In [37]:
%%bash

cat > test_linting.py << 'EOF'
import os
import sys
import json # Unused import

def calculate_sum(numbers):
    """Calculate sum of numbers."""
    total = 0
    for num in numbers:
        total = total + num
    return total

def process_data(data):
    """Process some data."""
    result = calculate_sum(data)
    print(f"Result: {result}")
    _ = os.getcwd()  # Use os
    _ = sys.argv[0]  # Use sys
    undefined_variable = some_undefined_function()  # Using undefined name
    return result

import sys # Duplicate import
EOF

echo "Created test_linting.py"

Created test_linting.py


### 5.2 Check Linting Issues (Without Fixing)

In [38]:
!uv run ruff check test_linting.py

[1m[91mI001 [0m[[1m[96m*[0m] [1mImport block is un-sorted or un-formatted[0m
 [1m[94m-->[0m test_linting.py:1:1
  [1m[94m|[0m
[1m[94m1 |[0m [1m[91m/[0m import os
[1m[94m2 |[0m [1m[91m|[0m import sys
[1m[94m3 |[0m [1m[91m|[0m import json # Unused import
  [1m[94m|[0m [1m[91m|___________^[0m
[1m[94m4 |[0m
[1m[94m5 |[0m   def calculate_sum(numbers):
  [1m[94m|[0m
[1m[96mhelp[0m: [1mOrganize imports[0m

[1m[91mF401 [0m[[1m[96m*[0m] [1m`json` imported but unused[0m
 [1m[94m-->[0m test_linting.py:3:8
  [1m[94m|[0m
[1m[94m1 |[0m import os
[1m[94m2 |[0m import sys
[1m[94m3 |[0m import json # Unused import
  [1m[94m|[0m        [1m[91m^^^^[0m
[1m[94m4 |[0m
[1m[94m5 |[0m def calculate_sum(numbers):
  [1m[94m|[0m
[1m[96mhelp[0m: [1mRemove unused import: `json`[0m

[1m[91mF841 [0m[1mLocal variable `undefined_variable` is assigned to but never used[0m
  [1m[94m-->[0m test_linting.py:18:5
   [1m[9

Ruff will report several issues:
- **F401**: Unused import (`json` is imported but never used)
- **F811**: Duplicate import (`sys` is imported twice)
- **F821**: Undefined name (`some_undefined_function` doesn't exist)

### 5.3 Auto-fix Linting Issues (Where Possible)

In [39]:
!uv run ruff check --fix test_linting.py

[1m[91mF841 [0m[1mLocal variable `undefined_variable` is assigned to but never used[0m
  [1m[94m-->[0m test_linting.py:18:5
   [1m[94m|[0m
[1m[94m16 |[0m     _ = os.getcwd()  # Use os
[1m[94m17 |[0m     _ = sys.argv[0]  # Use sys
[1m[94m18 |[0m     undefined_variable = some_undefined_function()  # Using undefined name
   [1m[94m|[0m     [1m[91m^^^^^^^^^^^^^^^^^^[0m
[1m[94m19 |[0m     return result
   [1m[94m|[0m
[1m[96mhelp[0m: [1mRemove assignment to unused variable `undefined_variable`[0m

[1m[91mF821 [0m[1mUndefined name `some_undefined_function`[0m
  [1m[94m-->[0m test_linting.py:18:26
   [1m[94m|[0m
[1m[94m16 |[0m     _ = os.getcwd()  # Use os
[1m[94m17 |[0m     _ = sys.argv[0]  # Use sys
[1m[94m18 |[0m     undefined_variable = some_undefined_function()  # Using undefined name
   [1m[94m|[0m                          [1m[91m^^^^^^^^^^^^^^^^^^^^^^^[0m
[1m[94m19 |[0m     return result
   [1m[94m|[0m

Found 5 errors 

Ruff will automatically fix:
- Remove unused imports
- Remove duplicate imports

But it won't fix the undefined name. That requires manual intervention since it's a logic error.


In [40]:
cat test_linting.py

import os
import sys


def calculate_sum(numbers):
    """Calculate sum of numbers."""
    total = 0
    for num in numbers:
        total = total + num
    return total

def process_data(data):
    """Process some data."""
    result = calculate_sum(data)
    print(f"Result: {result}")
    _ = os.getcwd()  # Use os
    _ = sys.argv[0]  # Use sys
    undefined_variable = some_undefined_function()  # Using undefined name
    return result



Let's remove the file now:

In [41]:
!rm test_linting.py

## 6. Running Unit Tests

Now let's run Brown's test suite with mocked LLM responses. The tests use fake models instead of real LLMs, making them fast, deterministic, and free.

In [42]:
!CONFIG_FILE=configs/debug.yaml uv run pytest -v

platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/fabio/Desktop/course-ai-agents/lessons/writing_workflow/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/fabio/Desktop/course-ai-agents/lessons/writing_workflow
configfile: pyproject.toml
plugins: asyncio-1.2.0, anyio-4.11.0, langsmith-0.4.38, opik-1.8.96
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 214 items                                                            [0m

tests/brown/data/test_loaders.py::TestMarkdownArticleLoader::test_article_loader_success [32mPASSED[0m[32m [  0%][0m
tests/brown/data/test_loaders.py::TestMarkdownArticleLoader::test_article_loader_file_not_found [32mPASSED[0m[32m [  0%][0m
tests/brown/data/test_loaders.py::TestMarkdownArticleGuidelineLoader::test_article_guideline_loader_success [32mPASSED[0m[32m [  1%][0m
tests/brown/data/test_loaders.py::TestMarkdownArticleGuidelineLoader

The `-v` flag provides verbose output, showing each test as it runs. The `CONFIG_FILE=configs/debug.yaml` ensures all tests use fake models instead of real LLMs.

### 6.1 Running Specific Test Files

You can run tests for specific components:

In [43]:
!CONFIG_FILE=configs/debug.yaml uv run pytest tests/brown/domain/ -v

platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/fabio/Desktop/course-ai-agents/lessons/writing_workflow/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/fabio/Desktop/course-ai-agents/lessons/writing_workflow
configfile: pyproject.toml
plugins: asyncio-1.2.0, anyio-4.11.0, langsmith-0.4.38, opik-1.8.96
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 74 items                                                             [0m

tests/brown/domain/test_articles.py::TestArticle::test_article_creation [32mPASSED[0m[32m [  1%][0m
tests/brown/domain/test_articles.py::TestArticle::test_article_to_context [32mPASSED[0m[32m [  2%][0m
tests/brown/domain/test_articles.py::TestArticle::test_article_to_markdown [32mPASSED[0m[32m [  4%][0m
tests/brown/domain/test_articles.py::TestArticle::test_article_str_representation [32mPASSED[0m[32m [  5%][0m
tests/brown/domain/test_ar

In [44]:
!CONFIG_FILE=configs/debug.yaml uv run pytest tests/brown/nodes/ -v

platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/fabio/Desktop/course-ai-agents/lessons/writing_workflow/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/fabio/Desktop/course-ai-agents/lessons/writing_workflow
configfile: pyproject.toml
plugins: asyncio-1.2.0, anyio-4.11.0, langsmith-0.4.38, opik-1.8.96
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 37 items                                                             [0m

tests/brown/nodes/test_article_reviewer.py::TestArticleReviewer::test_article_reviewer_initialization [32mPASSED[0m[32m [  2%][0m
tests/brown/nodes/test_article_reviewer.py::TestArticleReviewer::test_article_reviewer_ainvoke_success [32mPASSED[0m[32m [  5%][0m
tests/brown/nodes/test_article_reviewer.py::TestArticleReviewer::test_article_reviewer_structured_output [32mPASSED[0m[32m [  8%][0m
tests/brown/nodes/test_article_reviewer.py::TestArt

### 6.3 Understanding What's Being Tested

Brown's test suite includes:
- **Domain tests** (`tests/brown/domain/`): Testing Pydantic models and data structures without any LLM calls
- **Node tests** (`tests/brown/nodes/`): Testing agent nodes like ArticleWriter and ArticleReviewer with mocked LLM responses
- **Utility tests** (`tests/brown/utils/`): Testing helper functions
- **Evaluation tests** (`tests/brown/evals/`): Testing evaluation metrics and dataset handling

The complete test suite runs in under a minute and requires no API keys. Every test is deterministic.