# Unit 1

## Introduction and Project Overview

Tentu, berikut adalah konversi teks Anda ke format Markdown dalam bahasa Inggris:

-----

# Welcome and Lesson Goals

Welcome to the first lesson of the **Database Setup and Code Ingestion** course\! In this lesson, you will learn how to set up the foundation for storing and managing code data using **Python** data structures that will later be ingested into a database.

By the end of this lesson, you will understand the purpose of the project, what each part of the initial setup code does, and how these data structures prepare us for database storage. This knowledge will prepare you for hands-on practice with database setup and code ingestion in the rest of the course.

-----

## Project Context: What Is Database Setup and Code Ingestion?

Before we dive into the code, let's talk about what we are building and why.

This course is the first step in building an **LLM Code Review Assistant**—a system that can analyze and review code using large language models. However, before we can build an intelligent code review system, we need a solid foundation for storing and managing code data. That's exactly what this course covers: database setup and code ingestion.

An LLM Code Review Assistant needs access to code files, changes, commit history, and metadata. All this information must be stored, organized, and easily retrievable. This course focuses on creating that data foundation by teaching you how to structure code data before ingesting it into a database.

Throughout this course, you will learn how to:

  * Define Python data structures to represent code files and their metadata.
  * Prepare code data for database ingestion.
  * Set up database schemas to store code information.
  * Implement processes to ingest structured code data into databases.

By the end of the course, you will have a working system that can take code information, structure it properly, and store it in a database. This database will serve as the foundation for the LLM Code Review Assistant you'll build in subsequent courses.

-----

## Exploring the Project Setup Code

Let's break down the initial setup code step by step. We will use Python's `dataclass` feature to define data structures that represent the code information we want to store in our database. These structures will serve as the foundation for our database schema design and data ingestion processes.

### 1\. Importing Required Modules

First, we need to import some modules that will help us define our data structures:

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List
import os.path
```

  * `dataclass` helps us create classes that are mainly used to store data.
  * `datetime` allows us to work with dates and times, which is useful for tracking when files are updated or commits are made.
  * `List` from the `typing` module lets us specify that a variable should be a list of items.
  * `os.path` provides utilities for working with file paths, such as checking if files exist, getting file extensions, and manipulating path strings. This will be useful when working with code file paths in our data structures.

### 2\. Defining the `CodeFile` Data Class

Next, let's define a class to represent a code file:

```python
@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime
```

  * `@dataclass` is a decorator that tells Python to automatically add special methods to the class, like `__init__` (for creating new objects) and `__repr__` (for printing objects).
  * **`file_path`** is a string that stores the location of the file (for example, `"src/main.py"`).
  * **`content`** is a string that holds the actual code inside the file.
  * **`language`** is a string that tells us what programming language the file uses (like `"python"` or `"javascript"`).
  * **`last_updated`** is a `datetime` object that records when the file was last changed.

**Example:**

```python
from datetime import datetime

example_file = CodeFile(
    file_path="src/main.py",
    content="print('Hello, world!')",
    language="python",
    last_updated=datetime.now()
)
print(example_file)
```

**Output:**

```
CodeFile(file_path='src/main.py', content="print('Hello, world!')", language='python', last_updated=2024-06-10 12:34:56.789012)
```

This creates a `CodeFile` object and prints its details.

### 3\. Defining the `GitCommit` Data Class

Now, let's define a class to represent a commit (a saved change in the project):

```python
@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime
```

  * **`hash`** is a unique string that identifies the commit (like `"a1b2c3d4"`).
  * **`message`** is a short description of what the commit does.
  * **`author`** is the name of the person who made the commit.
  * **`date`** is when the commit was made.

**Example:**

```python
commit = GitCommit(
    hash="a1b2c3d4",
    message="Initial commit",
    author="Jane Doe",
    date=datetime.now()
)
print(commit)
```

**Output:**

```
GitCommit(hash='a1b2c3d4', message='Initial commit', author='Jane Doe', date=2024-06-10 12:35:00.123456)
```

### 4\. Defining the `FileChange` Data Class

Finally, let's define a class to represent a change made to a file:

```python
@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str
```

  * **`file_path`** is the location of the file that was changed.
  * **`commit_hash`** links this change to a specific commit.
  * **`diff_content`** is a string that shows what was changed in the file (for example, lines that were added or removed).

**Example:**

```python
change = FileChange(
    file_path="src/main.py",
    commit_hash="a1b2c3d4",
    diff_content="+ print('Hello, world!')"
)
print(change)
```

**Output:**

```
FileChange(file_path='src/main.py', commit_hash='a1b2c3d4', diff_content="+ print('Hello, world!')")
```

### 5\. The Main Block

At the end of the file, you will see this code:

```python
if __name__ == "__main__":
    print("Project setup complete. Ready to build components!")
```

  * This block checks if the script is being run directly (not imported as a module).
  * If it is, it prints a message to let you know the setup is complete.

**Output:**

```
Project setup complete. Ready to build components!
```

This is a common pattern in Python to make sure certain code only runs when you execute the file directly.

-----

## Summary and What's Next

In this lesson, you learned about the main building blocks of the LLM Code Review Assistant project. We covered how to use Python data classes to represent code files, commits, and file changes, and explained the purpose of each part of the setup code. You also saw examples of how to create and use these classes.

Next, you will get a chance to practice working with these data classes in hands-on exercises. This will help you become more comfortable with the concepts and prepare you for building more advanced features in the project. Good luck, and let's get started\!

## Creating Dataclass Instances for Code Review

Now that you understand the data structures for our code review assistant, let's put them into practice! In this exercise, you'll create instances of each dataclass we've defined to see how they work in action.

Your task is to create and print sample instances of all three dataclasses:

Create a CodeFile instance with a realistic file path, sample code content, programming language, and the current time.
Create a GitCommit instance with a commit hash, message, author name, and timestamp.
Create a FileChange instance that represents a modification to your sample file.
This hands-on practice will help you understand how these dataclasses store and represent information, which is essential for building the rest of our code review assistant. By seeing the automatic string representation of each object, you'll better understand how Python's dataclasses make our code cleaner and more readable.

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

if __name__ == "__main__":
    # TODO: Create a CodeFile instance with a realistic file path, content, language, and datetime.now() for last_updated
    
    # TODO: Print the CodeFile instance
    
    # TODO: Create a GitCommit instance with a sample hash, commit message, author name, and datetime.now() for date
    
    # TODO: Print the GitCommit instance
    
    # TODO: Create a FileChange instance with a file path, commit hash, and sample diff content
    
    # TODO: Print the FileChange instance
    
    print("Project setup complete. Ready to build components!")

```

Here is the solution, filling in the necessary Python code to create and print the instances of the three dataclasses:

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

if __name__ == "__main__":
    # Create a CodeFile instance
    sample_code_file = CodeFile(
        file_path="src/utils/data_processor.py",
        content="def process_data(data):\n    return [d.upper() for d in data]",
        language="python",
        last_updated=datetime.now()
    )
    
    # Print the CodeFile instance
    print("---")
    print("CodeFile Instance:")
    print(sample_code_file)
    print("---")
    
    # Create a GitCommit instance
    commit_hash = "f3a8b2e1c7d6"  # Use this hash for the FileChange instance
    sample_commit = GitCommit(
        hash=commit_hash,
        message="feat: Add data processing utility function",
        author="Alice Programmer",
        date=datetime.now()
    )
    
    # Print the GitCommit instance
    print("GitCommit Instance:")
    print(sample_commit)
    print("---")
    
    # Create a FileChange instance
    sample_file_change = FileChange(
        file_path=sample_code_file.file_path,
        commit_hash=commit_hash,
        diff_content="@@ -1,1 +1,2 @@\n+def process_data(data):\n-    return [d.lower() for d in data]\n+    return [d.upper() for d in data]"
    )
    
    # Print the FileChange instance
    print("FileChange Instance:")
    print(sample_file_change)
    print("---")
    
    print("Project setup complete. Ready to build components! ✅")
```

### Explanation of the Output

When you run the code above, the `print()` function automatically uses the `__repr__` method generated by the `@dataclass` decorator, providing a clean, readable string representation of each object, including the attribute names and their values.

For example, the output for `sample_code_file` will look similar to this (the `datetime` will reflect the time you run the script):

```
---
CodeFile Instance:
CodeFile(file_path='src/utils/data_processor.py', content="def process_data(data):\n    return [d.upper() for d in data]", language='python', last_updated=2025-09-29 14:45:57.123456)
---
```

This confirms that the data has been correctly structured and is ready to be handled or persisted in a database.

## Extending Dataclasses with New Fields

Excellent work creating those dataclass instances! Now, let's enhance our CodeFile class to make it even more useful. In real code review systems, tracking file_size is important for performance analysis and storage management.

Your task is to extend the CodeFile dataclass by:

Adding a new field called file_size with the type int to the class definition
Updating the sample CodeFile instance creation to include a value for this new field
This exercise shows how dataclasses can evolve as project requirements change. When you add a new field to a dataclass, you need to update all the places where instances are created — a common scenario in real-world development. By completing this task, you'll gain practical experience with extending data structures while maintaining compatibility with existing code.

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime
    # TODO: Add a new field called file_size with type int

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

if __name__ == "__main__":
    # Create a CodeFile instance
    sample_file = CodeFile(
        file_path="src/main.py",
        content="def hello():\n    print('Hello, world!')",
        language="python",
        last_updated=datetime.now()
        # TODO: Add the file_size parameter with an integer value
    )
    print(sample_file)
    
    # Create a GitCommit instance
    sample_commit = GitCommit(
        hash="a7f3d28c",
        message="Add hello function",
        author="Jane Smith",
        date=datetime.now()
    )
    print(sample_commit)
    
    # Create a FileChange instance
    sample_change = FileChange(
        file_path="src/main.py",
        commit_hash="a7f3d28c",
        diff_content="+ def hello():\n+     print('Hello, world!')"
    )
    print(sample_change)
    
    print("Project setup complete. Ready to build components!")

```

Great idea to enhance the data structure\! Adding **`file_size`** is a practical requirement for any serious code analysis tool.

Here is the solved Python code with the `CodeFile` dataclass extended and the new field populated in the instance creation:

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime
    # SOLVED: Add a new field called file_size with type int
    file_size: int

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

if __name__ == "__main__":
    # Create a CodeFile instance
    sample_file = CodeFile(
        file_path="src/main.py",
        content="def hello():\n    print('Hello, world!')",
        language="python",
        last_updated=datetime.now(),
        # SOLVED: Add the file_size parameter with an integer value
        file_size=56  # Assuming the content is 56 bytes long
    )
    print(sample_file)
    
    # Create a GitCommit instance
    sample_commit = GitCommit(
        hash="a7f3d28c",
        message="Add hello function",
        author="Jane Smith",
        date=datetime.now()
    )
    print(sample_commit)
    
    # Create a FileChange instance
    sample_change = FileChange(
        file_path="src/main.py",
        commit_hash="a7f3d28c",
        diff_content="+ def hello():\n+     print('Hello, world!')"
    )
    print(sample_change)
    
    print("Project setup complete. Ready to build components!")
```

### Key Takeaway

By adding **`file_size: int`** to the end of the `CodeFile` definition, we automatically require this parameter whenever a new `CodeFile` object is created. We successfully updated the `sample_file` creation with a sample integer value (`file_size=56`), ensuring the new data structure is used correctly.

## Counting Files by Programming Language

Now that you've extended the CodeFile class with a new field, let's put our dataclasses to work in a more practical scenario! In real code repositories, we often need to analyze files by their programming language.

Your task is to create a program that counts how many files belong to each programming language in a collection. You'll need to:

Create a list with at least five CodeFile instances representing different programming languages
Build a dictionary to track the count of files per language
Write a loop that analyzes the collection and updates the counts
Display a summary of your findings
This exercise simulates a common task in code analysis: categorizing files by language to understand the composition of a codebase. As you build more advanced features for the code review assistant, you'll frequently need to group and count items based on their properties.

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List, Dict

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime
    file_size: int

# TODO: Create a list of at least 5 CodeFile instances with different languages
# Example: Python, JavaScript, Java, CSS, etc.
code_files = [
    # Add at least 5 CodeFile instances here
]

# TODO: Create a dictionary to store the count of files per language
language_counts = {}

# TODO: Write a loop to count files by language
# The loop should iterate through code_files and update language_counts

# TODO: Display the counts in a readable format
# Print a summary showing how many files belong to each language

if __name__ == "__main__":
    # The code above will run when this file is executed directly
    pass

```

Here is the completed Python solution that creates a list of `CodeFile` instances and then processes that list to count the files categorized by their programming language:

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List, Dict

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime
    file_size: int

# SOLVED: Create a list of at least 5 CodeFile instances with different languages
current_time = datetime.now()
code_files: List[CodeFile] = [
    CodeFile(
        file_path="src/api.py",
        content="...",
        language="Python",
        last_updated=current_time,
        file_size=1200
    ),
    CodeFile(
        file_path="src/database.sql",
        content="...",
        language="SQL",
        last_updated=current_time,
        file_size=550
    ),
    CodeFile(
        file_path="src/index.html",
        content="...",
        language="HTML",
        last_updated=current_time,
        file_size=300
    ),
    CodeFile(
        file_path="app.js",
        content="...",
        language="JavaScript",
        last_updated=current_time,
        file_size=890
    ),
    CodeFile(
        file_path="styles/main.css",
        content="...",
        language="CSS",
        last_updated=current_time,
        file_size=210
    ),
    # Adding a duplicate language for counting purposes
    CodeFile(
        file_path="scripts/utility.py",
        content="...",
        language="Python",
        last_updated=current_time,
        file_size=450
    )
]

# SOLVED: Create a dictionary to store the count of files per language
language_counts: Dict[str, int] = {}

# SOLVED: Write a loop to count files by language
for file in code_files:
    lang = file.language
    # If the language is already a key in the dictionary, increment its count
    if lang in language_counts:
        language_counts[lang] += 1
    # Otherwise, initialize the count for that language to 1
    else:
        language_counts[lang] = 1

if __name__ == "__main__":
    # SOLVED: Display the counts in a readable format
    print("---")
    print("Codebase Language Composition Analysis 📊")
    print("---")
    print(f"Total files analyzed: {len(code_files)}\n")
    
    for language, count in language_counts.items():
        print(f"  {language}: {count} file(s)")
    
    print("\nAnalysis complete.")
```

-----

## Explanation

1.  **Data Initialization**: A list named `code_files` is created, populated with six `CodeFile` instances. We intentionally included two files with the `"Python"` language to test the counting mechanism. An empty dictionary, `language_counts`, is initialized to store the results.
2.  **Counting Logic**: The `for` loop iterates through every `file` in the `code_files` list.
      * Inside the loop, it extracts the `language` of the current file.
      * It uses an `if/else` block to check if the `lang` already exists as a key in the `language_counts` dictionary.
          * If it exists, its corresponding count value is **incremented** by 1.
          * If it does not exist (meaning it's the first file of that language encountered), it's added to the dictionary with an initial value of **1**.
3.  **Display**: Finally, a second `for` loop iterates through the key-value pairs (`language`, `count`) in the final `language_counts` dictionary and prints the summary in a clear format. This effectively simulates a basic code analysis report.

## Adding Methods to Dataclasses

Now let's take our dataclasses to the next level by adding behavior. In real-world applications, dataclasses aren't just for storing data — they can also include methods that operate on that data.

Your task is to add a get_file_extension() method to the CodeFile class that:

Extracts the file extension from the file_path (like .py from src/main.py)
Handles files without extensions (like README or Dockerfile) by returning an empty string
Returns the extension as a string, including the dot (e.g., .py)
After implementing the method, test it with the provided CodeFile instances that have different types of file paths. This will help you verify that your method works correctly for various scenarios.

💡 Hint: You'll need to import the os.path module to easily extract file extensions from file paths. The os.path.splitext() function will be particularly useful for this task.

This exercise shows how dataclasses can combine data storage with useful functionality, making your code more organized and easier to use.

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List
# TODO: You might need to import additional modules for file path handling

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime
    
    # TODO: Add a get_file_extension method that extracts and returns the file extension from file_path
    # The method should handle cases where there is no extension and return an appropriate default value

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

if __name__ == "__main__":
    # Create sample CodeFile instances with different file paths
    python_file = CodeFile(
        file_path="src/main.py",
        content="print('Hello, world!')",
        language="python",
        last_updated=datetime.now()
    )
    
    javascript_file = CodeFile(
        file_path="web/script.js",
        content="console.log('Hello, world!');",
        language="javascript",
        last_updated=datetime.now()
    )
    
    readme_file = CodeFile(
        file_path="README",
        content="# Project Documentation",
        language="markdown",
        last_updated=datetime.now()
    )
    
    dockerfile = CodeFile(
        file_path="Dockerfile",
        content="FROM python:3.9-slim",
        language="dockerfile",
        last_updated=datetime.now()
    )
    
    # TODO: Call the get_file_extension method on each CodeFile instance and print the results
    # Make sure to display both the file path and its extension
    
    print("Project setup complete. Ready to build components!")

```

Integrating behavior directly into your data structures is a best practice in object-oriented programming\! By adding the `get_file_extension()` method, the `CodeFile` object becomes responsible for its own file path logic.

Here is the solved Python code:

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List
# SOLVED: Import the os.path module for file path handling
import os.path

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime
    
    # SOLVED: Add a get_file_extension method that extracts and returns the file extension
    def get_file_extension(self) -> str:
        """
        Extracts the file extension from file_path, including the dot (e.g., '.py').
        Returns an empty string if no extension is found.
        """
        # os.path.splitext(self.file_path) splits the path into (root, extension)
        # We only need the second element, which is the extension
        _, extension = os.path.splitext(self.file_path)
        return extension

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

if __name__ == "__main__":
    # Create sample CodeFile instances with different file paths
    python_file = CodeFile(
        file_path="src/main.py",
        content="print('Hello, world!')",
        language="python",
        last_updated=datetime.now()
    )
    
    javascript_file = CodeFile(
        file_path="web/script.js",
        content="console.log('Hello, world!');",
        language="javascript",
        last_updated=datetime.now()
    )
    
    readme_file = CodeFile(
        file_path="README",
        content="# Project Documentation",
        language="markdown",
        last_updated=datetime.now()
    )
    
    dockerfile = CodeFile(
        file_path="Dockerfile",
        content="FROM python:3.9-slim",
        language="dockerfile",
        last_updated=datetime.now()
    )
    
    print("---")
    print("Testing get_file_extension() Method")
    print("---")
    
    # SOLVED: Call the get_file_extension method on each CodeFile instance and print the results
    
    # Test 1: File with a standard extension (.py)
    print(f"File Path: {python_file.file_path}")
    print(f"Extension: {python_file.get_file_extension()}")
    print("-" * 10)

    # Test 2: File with a different extension (.js)
    print(f"File Path: {javascript_file.file_path}")
    print(f"Extension: {javascript_file.get_file_extension()}")
    print("-" * 10)

    # Test 3: File without an extension (README)
    print(f"File Path: {readme_file.file_path}")
    print(f"Extension: '{readme_file.get_file_extension()}'") # Prints '' for visibility
    print("-" * 10)

    # Test 4: File without an extension (Dockerfile)
    print(f"File Path: {dockerfile.file_path}")
    print(f"Extension: '{dockerfile.get_file_extension()}'") # Prints '' for visibility
    print("-" * 10)
    
    print("Project setup complete. Ready to build components!")
```

### Method Implementation Detail

The core of the solution lies in the method:

```python
    def get_file_extension(self) -> str:
        _, extension = os.path.splitext(self.file_path)
        return extension
```

The `os.path.splitext()` function is highly efficient for this task. It correctly handles various file path formats:

  * For `"src/main.py"`, it returns `('src/main', '.py')`.
  * For `"README"`, it returns `('README', '')`.

By assigning the result to `_, extension`, we unpack the tuple and discard the file root (`_`), returning only the extension as a string (e.g., `".py"` or `""`).

## Linking Commits to Their File Changes

Now that you've learned how to add methods to dataclasses, let's connect our data structures together! In a real code review system, one of the most common tasks is finding all file changes associated with a specific commit.

Your task is to implement the link_commit_to_files function that connects commits with their related file changes. The function should:

Take a GitCommit instance and a list of FileChange instances as parameters.
Return only the FileChange instances where the commit_hash matches the commit's hash.
Return an empty list if no matching changes are found.
You'll work with sample data that includes one commit and several file changes — some matching the commit and some belonging to different commits. This exercise shows how the different parts of our code review system relate to each other in practice.

By completing this task, you'll gain valuable experience in filtering data based on relationships between objects, a skill that's essential for building more complex features in our code review assistant.

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

def link_commit_to_files(commit: GitCommit, changes: List[FileChange]) -> List[FileChange]:
    """
    Find all file changes that belong to a specific commit.
    
    Args:
        commit: A GitCommit instance
        changes: A list of FileChange instances
        
    Returns:
        A list of FileChange instances that match the commit's hash
    """
    # TODO: Create an empty list to store matching changes
    
    # TODO: Loop through each change in the changes list
    
    # TODO: Check if the change's commit_hash matches the commit's hash
    # and add matching changes to your list
    
    # TODO: Return the list of matching changes
    return []

if __name__ == "__main__":
    # Create a sample commit
    sample_commit = GitCommit(
        hash="a7f3d28c",
        message="Add login functionality",
        author="Jane Smith",
        date=datetime.now()
    )
    
    # Create sample file changes (some matching the commit, some not)
    file_changes = [
        FileChange(
            file_path="src/auth.py",
            commit_hash="a7f3d28c",  # Matches our commit
            diff_content="+ def login(username, password):\n+     return check_credentials(username, password)"
        ),
        FileChange(
            file_path="src/utils.py",
            commit_hash="b8e4f19d",  # Different commit
            diff_content="+ def format_date(date):\n+     return date.strftime('%Y-%m-%d')"
        ),
        FileChange(
            file_path="src/models.py",
            commit_hash="a7f3d28c",  # Matches our commit
            diff_content="+ class User:\n+     def __init__(self, username, email):\n+         self.username = username\n+         self.email = email"
        ),
        FileChange(
            file_path="tests/test_auth.py",
            commit_hash="a7f3d28c",  # Matches our commit
            diff_content="+ def test_login():\n+     assert login('admin', 'password') == True"
        ),
        FileChange(
            file_path="README.md",
            commit_hash="c9d2e35a",  # Different commit
            diff_content="+ ## Authentication\n+ The system now supports user authentication."
        )
    ]
    
    # Find changes for our commit
    matching_changes = link_commit_to_files(sample_commit, file_changes)
    
    # Display the results
    print(f"Commit: {sample_commit.hash} - {sample_commit.message}")
    print(f"Found {len(matching_changes)} matching file changes:")
    
    # TODO: Add code to display each matching change
    # If there are matching changes, loop through them and print details
    # If there are no matching changes, print a message saying so

```

Connecting objects through a common identifier, like a commit hash, is a fundamental database operation\! The Python solution below uses a list comprehension (a concise way to loop and filter) to implement the required logic efficiently.

Here is the solved Python code:

```python
from dataclasses import dataclass
from datetime import datetime
from typing import List

@dataclass
class CodeFile:
    file_path: str
    content: str
    language: str
    last_updated: datetime

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

def link_commit_to_files(commit: GitCommit, changes: List[FileChange]) -> List[FileChange]:
    """
    Find all file changes that belong to a specific commit.
    
    Args:
        commit: A GitCommit instance
        changes: A list of FileChange instances
        
    Returns:
        A list of FileChange instances that match the commit's hash
    """
    # SOLVED: Use a list comprehension to filter the changes
    matching_changes = [
        change for change in changes 
        if change.commit_hash == commit.hash
    ]
    
    return matching_changes

if __name__ == "__main__":
    # Create a sample commit
    sample_commit = GitCommit(
        hash="a7f3d28c",
        message="Add login functionality",
        author="Jane Smith",
        date=datetime.now()
    )
    
    # Create sample file changes (some matching the commit, some not)
    file_changes = [
        FileChange(
            file_path="src/auth.py",
            commit_hash="a7f3d28c",  # Matches our commit
            diff_content="+ def login(username, password):\n+     return check_credentials(username, password)"
        ),
        FileChange(
            file_path="src/utils.py",
            commit_hash="b8e4f19d",  # Different commit
            diff_content="+ def format_date(date):\n+     return date.strftime('%Y-%m-%d')"
        ),
        FileChange(
            file_path="src/models.py",
            commit_hash="a7f3d28c",  # Matches our commit
            diff_content="+ class User:\n+     def __init__(self, username, email):\n+         self.username = username\n+         self.email = email"
        ),
        FileChange(
            file_path="tests/test_auth.py",
            commit_hash="a7f3d28c",  # Matches our commit
            diff_content="+ def test_login():\n+     assert login('admin', 'password') == True"
        ),
        FileChange(
            file_path="README.md",
            commit_hash="c9d2e35a",  # Different commit
            diff_content="+ ## Authentication\n+ The system now supports user authentication."
        )
    ]
    
    # Find changes for our commit
    matching_changes = link_commit_to_files(sample_commit, file_changes)
    
    # Display the results
    print("---")
    print(f"Searching for changes associated with Commit: **{sample_commit.hash}**")
    print("---")
    
    if matching_changes:
        print(f"Found **{len(matching_changes)}** matching file changes:")
        # SOLVED: Loop through matching changes and print details
        for i, change in enumerate(matching_changes, 1):
            print(f"  {i}. File Path: {change.file_path}")
            print(f"     Diff Snippet: {change.diff_content[:30]}...")
            
    else:
        print("No file changes found for this commit.")

```

-----

## Method Implementation Breakdown

The core filtering logic is handled by the following **list comprehension** within the `link_commit_to_files` function:

```python
    matching_changes = [
        change for change in changes 
        if change.commit_hash == commit.hash
    ]
```

1.  **`change for change in changes`**: This iterates over every `FileChange` object in the input list.
2.  **`if change.commit_hash == commit.hash`**: This is the filtering condition. It compares the `commit_hash` attribute of the current `FileChange` object with the `hash` attribute of the input `GitCommit` object.
3.  Only the `FileChange` objects that satisfy the condition are collected into the `matching_changes` list, which is then returned.

This effectively simulates a relational query in a database, establishing a link between a parent entity (`GitCommit`) and its child entities (`FileChange`).