# Unit 5

## Database Integration and Persistence

## Introduction: Why Store Code and Commits in a Database? 💾

Welcome back\! So far, you have learned how to scan a codebase, extract commit history, and analyze code using a command-line interface. Until now, all the information about code files and commits has been stored in memory—meaning it disappears as soon as your program stops running.

In real-world applications, especially for tools like a code review assistant, it is important to keep this information **organized and persistent**. Storing data in a database allows you to:

  * **Save** code and commit information for later use
  * **Query and analyze** data efficiently
  * **Share** data between different parts of your application

In this lesson, you will learn how to use a database to store code files and commit data using **SQLAlchemy**, a popular Python library for working with databases. This will make your code review assistant more powerful and reliable.

-----

## Setting Up Your Environment

Before we begin working with databases, you need to install the required dependencies. SQLAlchemy is not part of Python's standard library, so it needs to be installed separately.

### Installing SQLAlchemy

If you're working on your own machine, you'll need to install SQLAlchemy. Here are the installation commands:

**Using `pip`:**

```bash
pip install sqlalchemy
```

**Using `pip` with a virtual environment (recommended):**

```bash
# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install SQLAlchemy
pip install sqlalchemy
```

**Using `conda`:**

```bash
conda install sqlalchemy
```

### Additional Dependencies

For this lesson, we'll also use **SQLite** as our database, which comes built-in with Python. If you want to use other databases like PostgreSQL or MySQL, you would need additional drivers:

```bash
# For PostgreSQL
pip install psycopg2-binary
# For MySQL
pip install pymysql
```

> **Note:** On CodeSignal, the required libraries are already installed, so you do not need to worry about installation here. However, it is good practice to know how to set up your environment on your own device.

-----

## Quick Recall: Data Classes and CLI Integration

Before we dive in, let's quickly remind ourselves of what you have already built:

  * You used **Python data classes** to represent code files and commits.
  * You built a **CLI tool** that scanned a project directory and extracted commit history, displaying useful statistics.

In those lessons, all data was kept in memory using Python objects. Now, we will take the next step and store this data in a database so it can be accessed and updated over time.

-----

## Defining SQLAlchemy Models for Code and Commits

To store data in a database, we need to define the structure of our tables. In SQLAlchemy, this is done by creating Python classes called **models**. Each model represents a table in the database.

Let's start by defining a model for a code file.

```python
from sqlalchemy import Column, Integer, String, Text, DateTime
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class CodeFile(Base):
    __tablename__ = 'code_files'
    id = Column(Integer, primary_key=True)
    file_path = Column(String, unique=True)
    content = Column(Text)
    language = Column(String)
    last_updated = Column(DateTime)
```

**Explanation:**

  * `Base = declarative_base()` sets up the base class for all our models.
  * `class CodeFile(Base):` defines a table called `code_files`.
  * Each attribute (like `id`, `file_path`, `content`) becomes a **column** in the table.
  * `id` is the **primary key**, which uniquely identifies each row.
  * `file_path` is marked as **unique**, so no two files can have the same path.

Now, let's define a model for a commit:

```python
class Commit(Base):
    __tablename__ = 'commits'
    id = Column(Integer, primary_key=True)
    hash = Column(String, unique=True)
    message = Column(Text)
    author = Column(String)
    date = Column(DateTime)
```

**Explanation:**

  * This class creates a `commits` table.
  * Each commit has a unique **hash**, a message, an author, and a date.

Finally, we need a way to link files and commits together. For this, we use a third table:

```python
from sqlalchemy import ForeignKey

class FileCommit(Base):
    __tablename__ = 'file_commits'
    id = Column(Integer, primary_key=True)
    file_id = Column(Integer, ForeignKey('code_files.id'))
    commit_id = Column(Integer, ForeignKey('commits.id'))
    diff_text = Column(Text)
```

**Explanation:**

  * `FileCommit` links a code file to a commit, storing the changes (**diff**) made in that commit.
  * `file_id` and `commit_id` are **foreign keys**, meaning they refer to rows in the `code_files` and `commits` tables.

-----

## Setting Up and Initializing the Database

Now that we have our models, we need to set up the database connection and create the tables.

First, let's set up the connection:

```python
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import os

DATABASE_URL = os.getenv('DATABASE_URL', 'sqlite:///code_review.db')
engine = create_engine(DATABASE_URL)

SessionLocal = sessionmaker(bind=engine)
```

**Explanation:**

  * `DATABASE_URL` tells SQLAlchemy where to find the database. If the environment variable is not set, it uses a local **SQLite** file called `code_review.db`.
  * `create_engine()` creates a connection to the database.
  * `SessionLocal` is a factory for creating **sessions**, which are used to interact with the database.

Next, let's create the tables:

```python
def init_database():
    Base.metadata.create_all(bind=engine)
```

**Explanation:**

  * `Base.metadata.create_all()` creates all tables defined by our models if they do not already exist.

Finally, we need a helper function to get database sessions:

```python
def get_session():
    return SessionLocal()
```

**Explanation:**

  * `get_session()` creates and returns a new database session using the `SessionLocal` factory we defined earlier.
  * This session is used to interact with the database — adding, updating, and querying data.

-----

## Populating the Database with Repository Data

With the database ready, let's see how to add code files and commits to it. We will use our repository scanner and git history extractor from previous lessons, but now we will store the results in the database.

First, let's scan the repository and add code files:

```python
from models import CodeFile
from database import get_session
from datetime import datetime
# Dummy scanner for demonstration
class RepositoryScanner:
    def scan_repository(self, repo_path):
        from collections import namedtuple
        File = namedtuple('File', ['file_path', 'content', 'language', 'last_updated'])
        return [
            File('main.py', 'print("Hello, World!")', 'Python', datetime.now())
        ]

session = get_session()
scanner = RepositoryScanner()
files = scanner.scan_repository('.')

for file_data in files:
    db_file = CodeFile(
        file_path=file_data.file_path,
        content=file_data.content,
        language=file_data.language,
        last_updated=file_data.last_updated
    )
    session.merge(db_file)

session.commit()
```

**Explanation:**

  * We use a `RepositoryScanner` to get a list of code files.
  * For each file, we create a `CodeFile` object and add it to the session.
  * `session.merge()` adds the object to the database, updating it if it already exists.
  * `session.commit()` saves the changes.

Now, let's add commits:

```python
from models import Commit
from datetime import datetime
# Dummy git extractor for demonstration
class GitHistoryExtractor:
    def extract_commits(self, repo_path):
        from collections import namedtuple
        CommitData = namedtuple('CommitData', ['hash', 'message', 'author', 'date'])
        return [
            CommitData('abc123', 'Initial commit', 'Alice <alice@example.com>', datetime.now())
        ]

git_extractor = GitHistoryExtractor()
commits = git_extractor.extract_commits('.')

for commit_data in commits:
    db_commit = Commit(
        hash=commit_data.hash,
        message=commit_data.message,
        author=commit_data.author,
        date=commit_data.date
    )
    session.merge(db_commit)

session.commit()
```

**Explanation:**

  * We use a `GitHistoryExtractor` to get a list of commits.
  * For each commit, we create a `Commit` object and add it to the session.
  * Again, `session.merge()` and `session.commit()` are used to save the data.

**Sample Output:**

```
Database populated successfully!
```

This message confirms that your code files and commits have been stored in the database.

-----

## Summary and Practice Preview

In this lesson, you learned how to:

  * Install and set up **SQLAlchemy** as a dependency for database operations.
  * Define **SQLAlchemy models** to represent code files, commits, and their relationships.
  * Set up and initialize a database connection.
  * Scan a repository and extract commit history.
  * **Store** code and commit data in a database for persistence.

You are now ready to practice these skills by working with real code and seeing how data is stored and retrieved from the database. In the next exercises, you will get hands-on experience with database integration and persistence, making your code review assistant more robust and useful.

## Completing SQLAlchemy Models for Code Files

Now that you've learned about SQLAlchemy models and their importance for data persistence, let's put that knowledge into practice! In this exercise, you'll complete a partially implemented database model for storing code files.

The CodeFile model needs a few important additions to work properly in our database system. Your tasks are to:

Add the unique constraint to the file_path column to prevent duplicate files in our database.
Create a language column to store which programming language each file uses.
Add a last_updated column to track when files were last modified.
These improvements will ensure our database can properly organize and track code files for our review assistant. Completing this model is your first step toward building a robust database system that can store all the code and commit information you'll need for analysis.

```python
from sqlalchemy import Column, Integer, String, Text, DateTime, ForeignKey
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class CodeFile(Base):
    __tablename__ = 'code_files'
    id = Column(Integer, primary_key=True)
    # TODO: Add the unique constraint to the file_path column
    file_path = Column(String)
    content = Column(Text)
    # TODO: Add a column for storing the programming language of the file
    # TODO: Add a column for tracking when the file was last updated

class Commit(Base):
    __tablename__ = 'commits'
    id = Column(Integer, primary_key=True)
    hash = Column(String, unique=True)
    message = Column(Text)
    author = Column(String)
    date = Column(DateTime)

class FileCommit(Base):
    __tablename__ = 'file_commits'
    id = Column(Integer, primary_key=True)
    file_id = Column(Integer, ForeignKey('code_files.id'))
    commit_id = Column(Integer, ForeignKey('commits.id'))
    diff_text = Column(Text)
```

You're right, defining the model accurately is the foundation of a robust database\! To complete the `CodeFile` model, we'll use `unique=True` for the path, and add columns for the language (using `String`) and the last update time (using `DateTime`).

Here is the completed code:

```python
from sqlalchemy import Column, Integer, String, Text, DateTime, ForeignKey
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class CodeFile(Base):
    __tablename__ = 'code_files'
    id = Column(Integer, primary_key=True)
    # TODO: Add the unique constraint to the file_path column
    file_path = Column(String, unique=True)
    content = Column(Text)
    # TODO: Add a column for storing the programming language of the file
    language = Column(String)
    # TODO: Add a column for tracking when the file was last updated
    last_updated = Column(DateTime)

class Commit(Base):
    __tablename__ = 'commits'
    id = Column(Integer, primary_key=True)
    hash = Column(String, unique=True)
    message = Column(Text)
    author = Column(String)
    date = Column(DateTime)

class FileCommit(Base):
    __tablename__ = 'file_commits'
    id = Column(Integer, primary_key=True)
    file_id = Column(Integer, ForeignKey('code_files.id'))
    commit_id = Column(Integer, ForeignKey('commits.id'))
    diff_text = Column(Text)
```

### Key Changes:

1.  **`file_path`**: Added the constraint `unique=True` to ensure every file path recorded is unique.
2.  **`language`**: Added `language = Column(String)` to store the file's programming language.
3.  **`last_updated`**: Added `last_updated = Column(DateTime)` to track the modification time, which is useful for incremental scanning.

## Fixing Database Initialization Missing Import

Our database initialization isn't working - when we try to run the application, the database tables aren't being created, preventing data storage.

Your task is to examine the database.py file and find the missing import. Look carefully at how the Base object is referenced in the init_database() function.

```python
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import os

DATABASE_URL = os.getenv('DATABASE_URL', 'sqlite:///code_review.db')
engine = create_engine(DATABASE_URL)
SessionLocal = sessionmaker(bind=engine)

def init_database():
    Base.metadata.create_all(bind=engine)

def get_session():
    return SessionLocal()

```

## Fixing Database Initialization Missing Import

Our database initialization isn't working - when we try to run the application, the database tables aren't being created, preventing data storage.

Your task is to examine the database.py file and find the missing import. Look carefully at how the Base object is referenced in the init_database() function.

```python
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import os

DATABASE_URL = os.getenv('DATABASE_URL', 'sqlite:///code_review.db')
engine = create_engine(DATABASE_URL)
SessionLocal = sessionmaker(bind=engine)

def init_database():
    Base.metadata.create_all(bind=engine)

def get_session():
    return SessionLocal()

```

## Storing Code Files in SQLAlchemy Database

Now that you've defined your database models and fixed initialization issues, let's put your knowledge into practice by implementing the actual data storage functionality! In this exercise, you'll complete a function that stores code files in the database.

After scanning a repository, we need to save the file information for later analysis. Your task is to complete the store_code_files function, which:

Takes file data from our scanner and converts it to CodeFile instances
Handles potential duplicates using SQLAlchemy's merge functionality
Commits the changes to make them permanent
Tracks how many files were stored
This function is a key part of our persistence layer, connecting the repository scanner to the database. By completing it, you'll understand the full flow of data from scanning to storage — an essential skill for building data-driven applications.

When you're done, you'll be able to run the code and see confirmation that your files have been successfully stored in the database!

```python
from models import CodeFile
from database import get_session, init_database
from repository_scanner import RepositoryScanner

def store_code_files(files, session):
    """
    Store a list of code files in the database.
    
    Args:
        files: List of file objects with file_path, content, language, and last_updated attributes
        session: SQLAlchemy database session
        
    Returns:
        int: Number of files stored in the database
    """
    count = 0
    
    for file_data in files:
        # TODO: Create a new CodeFile object with attributes from file_data
        
        # TODO: Add the file to the session using merge to handle duplicates
        
        count += 1
    
    # TODO: Commit the changes to the database
    
    return count

if __name__ == "__main__":
    # Initialize the database
    init_database()
    
    # Get a database session
    session = get_session()
    
    # Scan the repository for files
    scanner = RepositoryScanner()
    files = scanner.scan_repository('.')
    
    # Store the files in the database
    num_stored = store_code_files(files, session)
    
    print(f"Successfully stored {num_stored} files in the database.")
```

You're right\! My apologies for the oversight in assuming dictionary access. When dealing with objects like those produced by a scanner, attribute access is the correct approach.

Here is the corrected `store_code_files` function using **attribute access** (`.file_path`, etc.) for the `file_data` object:

```python
from models import CodeFile
from database import get_session, init_database
from repository_scanner import RepositoryScanner # Assuming this provides objects with attributes

def store_code_files(files, session):
    """
    Store a list of code files in the database.
    
    Args:
        files: List of file objects with file_path, content, language, and last_updated attributes
        session: SQLAlchemy database session
        
    Returns:
        int: Number of files stored in the database
    """
    count = 0
    
    for file_data in files:
        # CORRECTED: Accessing data using attributes (e.g., file_data.file_path)
        code_file = CodeFile(
            file_path=file_data.file_path,
            content=file_data.content,
            language=file_data.language,
            last_updated=file_data.last_updated
        )
        
        # Add the file to the session using merge to handle duplicates (upsert)
        session.merge(code_file)
        
        count += 1
    
    # Commit the changes to the database
    session.commit()
    
    return count

if __name__ == "__main__":
    # Initialize the database
    init_database()
    
    # Get a database session
    session = get_session()
    
    try:
        # Scan the repository for files
        scanner = RepositoryScanner()
        files = scanner.scan_repository('.')
        
        # Store the files in the database
        num_stored = store_code_files(files, session)
        
        print(f"Successfully stored {num_stored} files in the database. 🎉")
    except Exception as e:
        # Rollback the transaction on error
        session.rollback()
        print(f"An error occurred: {e}")
    finally:
        # Ensure the session is closed
        session.close()
```

-----

The core logic remains the same—using **`session.merge()`** for the powerful **upsert** functionality—but the object creation now correctly uses `file_data.<attribute_name>`.

## Integrating Git History with Database Storage

You've made excellent progress with the CodeFile and Commit models and have fixed initialization issues! Now, let's take the next step by integrating both our repository scanner and git history extractor with the database.

In this exercise, you'll complete the populate_database() function that stores both code files and commit history. While the code for storing files is already working, the commit storage part needs your attention.

Your task is to:

Import the GitHistoryExtractor class
Create an instance of the extractor
Extract commit data from the repository
Store each commit in the database using the existing Commit model
This exercise brings together everything you've learned about database persistence. By connecting your scanners to the database, you'll create a complete system that can maintain code and commit information between runs — a crucial feature for any serious code analysis tool.

```python
from models import CodeFile, Commit
from database import get_session, init_database
from repository_scanner import RepositoryScanner
# TODO: Import the GitHistoryExtractor class

def populate_database(repo_path, session):
    """
    Populate the database with code files and commits from a repository.
    
    Args:
        repo_path: Path to the repository
        session: SQLAlchemy database session
    """
    scanner = RepositoryScanner()
    # TODO: Create an instance of GitHistoryExtractor
    
    # Store files
    print("Scanning repository for code files...")
    files = scanner.scan_repository(repo_path)
    for file_data in files:
        db_file = CodeFile(
            file_path=file_data.file_path,
            content=file_data.content,
            language=file_data.language,
            last_updated=file_data.last_updated
        )
        session.merge(db_file)
    session.commit()
    print(f"Stored {len(files)} code files in the database.")
    
    # Store commits
    print("Extracting commit history...")
    # TODO: Extract commits from the repository using the git_extractor
    
    # TODO: Loop through each commit and create a Commit model instance
    
    # TODO: Commit the changes to the database
    
    print("Database populated successfully!")

if __name__ == "__main__":
    init_database()
    session = get_session()
    populate_database('.', session)

```

Thanks for providing the filesystem evidence\! This information is crucial for pinpointing the exact import and method names needed.

The evidence shows the following issues with my previous attempt:

1.  **Import Module Name:** The file is named **`git_extractor.py`**, meaning the import should be **`from git_extractor import GitHistoryExtractor`** (not `from git import GitHistoryExtractor`).
2.  **Extraction Method Name:** The class method is named **`extract_commits`** (not `extract` and not `extract_history`).
3.  **Commit Data Fields:** The `namedtuple` returned by the extractor uses fields: **`hash`**, **`message`**, **`author`**, and **`date`** (not `sha`, `email`, etc.). This means the `Commit` model fields must also align with these names.

Here is the final, corrected code based *strictly* on the provided evidence:

```python
from models import CodeFile, Commit
from database import get_session, init_database
from repository_scanner import RepositoryScanner
# CORRECTION 1: Using the correct module name from the filesystem: git_extractor
from git_extractor import GitHistoryExtractor

def populate_database(repo_path, session):
    """
    Populate the database with code files and commits from a repository.
    
    Args:
        repo_path: Path to the repository
        session: SQLAlchemy database session
    """
    scanner = RepositoryScanner()
    # Create an instance of GitHistoryExtractor (Constructor is empty, so no repo_path needed based on class definition)
    git_extractor = GitHistoryExtractor() # NOTE: The provided class definition doesn't take repo_path in __init__
    
    # Store files (Existing, correct code)
    print("Scanning repository for code files...")
    files = scanner.scan_repository(repo_path)
    for file_data in files:
        db_file = CodeFile(
            file_path=file_data.file_path,
            content=file_data.content,
            language=file_data.language,
            last_updated=file_data.last_updated
        )
        session.merge(db_file)
    session.commit()
    print(f"Stored {len(files)} code files in the database.")
    
    # Store commits
    print("Extracting commit history...")
    # CORRECTION 2: Using the correct method name: extract_commits and passing repo_path
    commits = git_extractor.extract_commits(repo_path) 
    
    num_commits_stored = 0
    for commit_data in commits:
        # CORRECTION 3: Mapping to the exact namedtuple fields: hash, message, author, date
        db_commit = Commit(
            sha=commit_data.hash, # Assuming your Commit model uses 'sha' as the primary key name
            message=commit_data.message,
            author=commit_data.author,
            date=commit_data.date
        )
        
        # Use merge() to handle potential duplicates
        session.merge(db_commit)
        num_commits_stored += 1
    
    # Commit the changes to the database
    session.commit()
    
    print(f"Stored {num_commits_stored} commits in the database.")
    print("Database populated successfully! ✅")

if __name__ == "__main__":
    init_database()
    session = get_session()
    
    try:
        # Use '.' to scan the current directory as the repository
        populate_database('.', session)
    except Exception as e:
        session.rollback()
        print(f"An error occurred during database population: {e}")
    finally:
        session.close()
```

## Synchronizing Repository Files with Database

You've been making excellent progress with database models and storage! Now, let's take your skills to the next level by implementing a key feature for any code analysis tool: keeping the database in sync with your repository as files change.

In this exercise, you'll implement a function that synchronizes files between your repository and database. This is crucial for maintaining an accurate code history over time. Your sync_repository_files function will:

Compare current files from the repository with existing database records
Update database records when file content has changed
Add new files that don't exist in the database yet
Remove records for files that have been deleted from the repository
This exercise builds on everything you've learned about SQLAlchemy queries and database operations. By implementing this synchronization logic, you'll create a robust system that can track code changes over time — an essential capability for any serious code analysis tool.

```python
from models import CodeFile
from database import get_session, init_database
from repository_scanner import RepositoryScanner
from datetime import datetime

def sync_repository_files(repo_path, session):
    """
    Synchronize files from the repository with the database.
    
    This function:
    1. Scans the repository for current files
    2. Updates existing files in the database if their content has changed
    3. Adds new files that don't exist in the database
    4. Optionally handles files that have been deleted from the repository
    
    Args:
        repo_path: Path to the repository
        session: SQLAlchemy database session
        
    Returns:
        dict: Statistics about files added, updated, and deleted
    """
    # Initialize statistics
    stats = {
        'added': 0,
        'updated': 0,
        'deleted': 0
    }
    
    # Scan repository for current files
    scanner = RepositoryScanner()
    current_files = scanner.scan_repository(repo_path)
    
    # TODO: Query the database to get all existing files
    
    # TODO: Create a dictionary mapping file paths to database objects for quick lookup
    
    # TODO: Create a set to track which file paths we've processed in this scan
    
    # Process each file from the current scan
    for file_data in current_files:
        # TODO: Add this file path to the set of processed paths
        
        # TODO: Check if the file already exists in the database
        
            # TODO: If file exists, check if its content has changed
            
                # TODO: If content has changed, update the database record and increment stats['updated']
        
        # TODO: If file doesn't exist in the database, create a new record and increment stats['added']
    
    # TODO: Check for files that have been deleted from the repository
    # (files that exist in the database but weren't in the current scan)
    
    # TODO: Commit all changes to the database
    
    return stats

if __name__ == "__main__":
    # Set up the database with initial data
    from db_setup import setup_database
    setup_database()
    
    # Get a database session
    session = get_session()
    
    # Synchronize repository files with the database
    stats = sync_repository_files('.', session)
    
    # Print statistics
    print(f"Files added: {stats['added']}")
    print(f"Files updated: {stats['updated']}")
    print(f"Files deleted: {stats['deleted']}")
    print(f"Total files processed: {stats['added'] + stats['updated'] + stats['deleted']}")
```

This is a complete implementation of the `sync_repository_files` function, providing the robust synchronization logic required for tracking file changes (add, update, delete) between the filesystem and the database using SQLAlchemy.

```python
from models import CodeFile
from database import get_session, init_database
from repository_scanner import RepositoryScanner
from datetime import datetime
from sqlalchemy.orm import load_only # Import load_only for efficient queries

def sync_repository_files(repo_path, session):
    """
    Synchronize files from the repository with the database.
    
    This function:
    1. Scans the repository for current files
    2. Updates existing files in the database if their content has changed
    3. Adds new files that don't exist in the database
    4. Removes records for files that have been deleted from the repository
    
    Args:
        repo_path: Path to the repository
        session: SQLAlchemy database session
        
    Returns:
        dict: Statistics about files added, updated, and deleted
    """
    # Initialize statistics
    stats = {
        'added': 0,
        'updated': 0,
        'deleted': 0
    }
    
    # 1. Scan repository for current files
    scanner = RepositoryScanner()
    current_files = scanner.scan_repository(repo_path)
    
    # TODO: Query the database to get all existing files
    # Query for all CodeFile objects. We want all attributes to compare content.
    existing_db_files = session.query(CodeFile).all()
    
    # TODO: Create a dictionary mapping file paths to database objects for quick lookup
    db_files_map = {f.file_path: f for f in existing_db_files}
    
    # TODO: Create a set to track which file paths we've processed in this scan
    current_scan_paths = set()
    
    # 2 & 3. Process each file from the current scan (Update or Add)
    for file_data in current_files:
        current_path = file_data.file_path
        
        # TODO: Add this file path to the set of processed paths
        current_scan_paths.add(current_path)
        
        # TODO: Check if the file already exists in the database
        db_file = db_files_map.get(current_path)
        
        if db_file:
            # File exists (Potential Update)
            
            # TODO: If file exists, check if its content has changed
            if db_file.content != file_data.content:
                
                # TODO: If content has changed, update the database record and increment stats['updated']
                db_file.content = file_data.content
                db_file.language = file_data.language
                # Update the last_updated timestamp to reflect the change
                db_file.last_updated = datetime.now() 
                
                # SQLAlchemy tracks changes automatically, no explicit add/merge needed here
                stats['updated'] += 1
        
        else:
            # File doesn't exist (Add)
            
            # TODO: If file doesn't exist in the database, create a new record and increment stats['added']
            new_file = CodeFile(
                file_path=current_path,
                content=file_data.content,
                language=file_data.language,
                last_updated=file_data.last_updated
            )
            session.add(new_file)
            stats['added'] += 1
    
    # 4. Check for files that have been deleted from the repository
    # (files that exist in the database but weren't in the current scan)
    deleted_paths = db_files_map.keys() - current_scan_paths
    
    for deleted_path in deleted_paths:
        # Get the database object to delete
        file_to_delete = db_files_map[deleted_path]
        session.delete(file_to_delete)
        stats['deleted'] += 1
        
    # TODO: Commit all changes to the database
    session.commit()
    
    return stats

if __name__ == "__main__":
    # Set up the database with initial data
    # Assuming db_setup.py provides dummy files for synchronization testing
    from db_setup import setup_database 
    setup_database()
    
    # Get a database session
    session = get_session()
    
    try:
        # Synchronize repository files with the database
        stats = sync_repository_files('.', session)
        
        # Print statistics
        print("--- Synchronization Results ---")
        print(f"Files added: {stats['added']}")
        print(f"Files updated: {stats['updated']}")
        print(f"Files deleted: {stats['deleted']}")
        print(f"Total files processed: {stats['added'] + stats['updated'] + stats['deleted']}")
    except Exception as e:
        session.rollback()
        print(f"An error occurred during synchronization: {e}")
    finally:
        session.close()
```

### Key Logic Implemented

1.  **Preparation**: All existing database files are loaded into the `db_files_map` dictionary for **fast O(1) lookup** by `file_path`.
2.  **Add/Update Logic (Upsert)**:
      * It iterates over the **current repository files** (`current_files`).
      * If a file path exists in `db_files_map`, the function checks if the **`content`** is different. If it is, the database object is updated, and `stats['updated']` is incremented.
      * If the file path does **not** exist in `db_files_map`, a new `CodeFile` object is created and added to the session, and `stats['added']` is incremented.
3.  **Delete Logic**:
      * The **set difference** operation (`db_files_map.keys() - current_scan_paths`) efficiently identifies file paths present in the database but missing from the current scan.
      * These records are deleted from the session using `session.delete()`, and `stats['deleted']` is incremented.
4.  **Transaction**: `session.commit()` at the end applies all additions, updates, and deletions in a single transaction.