# Unit 3

## Git History Extraction with Python

## Introduction: Why Extract Git History?

Welcome back\! In the previous lessons, you learned how to set up the **LLM Code Review Assistant** project and how to scan a codebase to collect information about code files. Now, we are ready to take the next step: extracting the history of changes made to the project using **Git**.

**Git history** is a record of all the changes that have been made to a project over time. This history is very valuable for understanding how a project has evolved, who made which changes, and why those changes were made. For code review and analysis, being able to look at past commits and file changes helps you spot patterns, understand the reasoning behind code, and even catch mistakes.

In this lesson, you will learn how to use **Python** to extract Git history from a project. This will give you the tools to analyze changes and prepare for more advanced code review tasks.

-----

## Quick Recall: Git Repositories and Commits

Before we dive in, let’s quickly remind ourselves what a **Git repository** and a **commit** are.

  * A **Git repository** is a folder that tracks changes to files using Git. It stores all the information about the project’s history, including every change made to the files.
  * A **commit** is a snapshot of the project at a certain point in time. Each commit has:
      * A unique **hash** (an ID)
      * A **message** describing the change
      * The **author’s name** and **email**
      * The **date** and **time** of the change

In the last lesson, you learned how to scan a codebase for files. Now, we will focus on reading the history of commits and the changes they contain.

-----

## What Information Can We Get from Git History?

When we extract Git history, we are mainly interested in two things:

1.  **Commit Details:** Information about each commit, such as the hash, message, author, and date.
2.  **File Changes:** Which files were changed in each commit, and what the changes were.

Here is an example of what a commit might look like:

```
Commit: 1a2b3c4d
Message: Add user authentication endpoints
Author: Alice Johnson <alice@example.com>
Date: 2024-06-01 10:15:00
```

And an example of a file change in that commit:

```
File: backend/src/api/auth.py
Change: (diff content showing what was added or removed)
```

This information helps you answer questions like:

  * Who made a certain change?
  * When was a feature added?
  * What exactly was changed in a file?

-----

## Extracting Git History with Python

Let's walk through how to extract Git history using **Python**. We will use the **`gitpython`** library, which makes it easy to interact with Git repositories from Python code.

### Installing Required Dependencies

Before we can start working with Git repositories in Python, we need to install the `gitpython` library. Run this command in your terminal:

```bash
pip install gitpython
```

This will install the library that allows Python to interact with Git repositories.

### Step 1: Import Required Libraries

First, we need to import the libraries we will use. **`gitpython`** is used to interact with Git, and **`dataclasses`** help us organize the data.

```python
from git import Repo
from datetime import datetime
from dataclasses import dataclass
```

  * `Repo` lets us work with a Git repository.
  * `datetime` is used for handling dates.
  * `dataclass` helps us define simple classes for storing data.

### Step 2: Define Data Structures

We will use two data classes: one for commits and one for file changes.

```python
@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str
```

  * `GitCommit` stores information about each commit.
  * `FileChange` stores information about each file change in a commit.

### Step 3: Create the `GitHistoryExtractor` Class

Now, let’s create a class that will handle extracting the history.

```python
class GitHistoryExtractor:
    def __init__(self):
        self.commits = []
        self.file_changes = []
```

The `__init__` method sets up two lists: one for commits and one for file changes.

### Step 4: Extract Commits and File Changes

Let's add a method to extract commits and their file changes.

```python
def extract_commits(self, repo_path, max_commits=50):
    """
    Extract commit history and file changes from a git repository.
    
    Args:
        repo_path (str): Path to the git repository directory
        max_commits (int): Maximum number of commits to process (default: 50)
        
    Returns:
        list: List of GitCommit objects containing commit information
    """
    print(f"Extracting git history: {repo_path}")
    
    # Initialize the repository object - this connects to the git repo
    repo = Repo(repo_path)

    # Iterate through commits starting from the most recent (HEAD)
    # max_count limits how many commits we process to avoid overwhelming data
    for commit in repo.iter_commits(max_count=max_commits):
        
        # Create a GitCommit object with all the essential commit information
        git_commit = GitCommit(
            hash=commit.hexsha,                    # Full SHA hash (unique identifier)
            message=commit.message.strip(),       # Commit message with whitespace removed
            author=f"{commit.author.name} <{commit.author.email}>",  # Author info
            date=commit.committed_datetime        # When the commit was made
        )
        
        # Add this commit to our collection
        self.commits.append(git_commit)
        
        # Extract file changes by comparing this commit with its parent
        # Check if commit has parents (first commit in repo has no parents)
        if commit.parents:
            # Get the immediate parent commit (most commits have one parent)
            parent = commit.parents[0]
            
            # Generate diff between parent and current commit
            # create_patch=True gives us the actual diff content (what changed)
            for diff in parent.diff(commit, create_patch=True):
                
                # diff.b_path is the file path after the change
                # We check b_path exists to handle deleted files gracefully
                if diff.b_path:
                    file_change = FileChange(
                        file_path=diff.b_path,                    # Path to the changed file
                        commit_hash=commit.hexsha,                # Which commit this change belongs to
                        # Decode binary diff data to string, ignoring errors
                        diff_content=diff.diff.decode('utf-8', errors='ignore') 
                    )
                    
                    # Add this file change to our collection
                    self.file_changes.append(file_change)

    # Report what we found
    print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
    return self.commits
```

**Breakdown of `extract_commits`:**

  * **Repository Connection:** `Repo(repo_path)` connects to the Git repository.
  * **Commit Iteration:** `repo.iter_commits(max_count=max_commits)` iterates through commits from most recent backwards, limited by `max_commits`.
  * **Commit Data:** Uses properties like `commit.hexsha`, `commit.message.strip()`, and `commit.committed_datetime` to populate the `GitCommit` dataclass.
  * **File Change (Diff):**
      * It checks `if commit.parents` to skip the first commit.
      * It uses `parent.diff(commit, create_patch=True)` to generate the changes.
      * `diff.b_path` is used for the file path after the change.
      * `diff.diff.decode('utf-8', errors='ignore')` safely converts the binary diff content to a readable string.

-----

### Step 5: Using the Extractor

Let’s see how to use this class in a script.

```python
def main():
    extractor = GitHistoryExtractor()
    repo_path = "./sample-ecommerce-api"
    
    # You must ensure that './sample-ecommerce-api' is a valid git repository
    commits = extractor.extract_commits(repo_path, max_commits=10)
    
    print("\nRecent commits:")
    for i, commit in enumerate(commits[:3]):
        print(f"{i+1}. {commit.hash[:8]} - {commit.message[:50]}...")
        print(f"   Author: {commit.author}")
        print(f"   Date: {commit.date}")
        print()

# Typically, this would be wrapped in a __main__ block for execution
# if __name__ == "__main__":
#     main()
```

**Example Output:**

```
Extracting git history: ./sample-ecommerce-api
Found 5 commits, 4 changes

Recent commits:
1. 9f8e7d6c - Add order processing functionality...
   Author: Carol Davis <carol@example.com>
   Date: 2024-06-01 12:00:00

2. 7a6b5c4d - Implement product CRUD operations...
   Author: Bob Smith <bob@example.com>
   Date: 2024-06-01 11:30:00

3. 5e4d3c2b - Add user authentication endpoints...
   Author: Alice Johnson <alice@example.com>
   Date: 2024-06-01 11:00:00
```

-----

## Summary And What’s Next

In this lesson, you learned how to extract Git history from a project using **Python**. You saw how to:

  * Use the **`gitpython`** library to access a repository.
  * Collect **commit details** and **file changes**.
  * Organize this information using **data classes**.

This prepares you for the practice exercises, where you will try out these steps yourself and get comfortable working with Git history in Python. Understanding Git history is a key skill for code review and project analysis, and you are now ready to put it into practice\!

## Customizing Git Commit Display Format

Now that you've learned how to extract Git history using Python, let's customize how this information is displayed to users. In this exercise, you'll modify the output format of our commit display to make it more concise and readable.

Your task is to update the print statement in the main function that shows commit information. Specifically:

Change the commit hash display to show only the first 6 characters (instead of 8).
Change the commit message display to show only the first 30 characters (instead of 50).
This small change will help make our output more compact while still providing enough information to identify commits. It's also a good way to become familiar with how the extracted Git data flows through our application.

By completing this exercise, you'll take your first step in customizing how Git history is presented, which is an important skill for building effective code review tools.

```python
from git import Repo
from datetime import datetime
import os

# Minimal dataclasses for continuity with the outline
from dataclasses import dataclass

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        self.commits = []
        self.file_changes = []
    
    def extract_commits(self, repo_path, max_commits=50):
        print(f"Extracting git history: {repo_path}")
        repo = Repo(repo_path)
        
        for commit in repo.iter_commits(max_count=max_commits):
            git_commit = GitCommit(
                hash=commit.hexsha,
                message=commit.message.strip(),
                author=f"{commit.author.name} <{commit.author.email}>",
                date=commit.committed_datetime
            )
            self.commits.append(git_commit)
            
            # Extract file changes
            if commit.parents:
                parent = commit.parents[0]
                for diff in parent.diff(commit, create_patch=True):
                    if diff.b_path:
                        file_change = FileChange(
                            file_path=diff.b_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        self.file_changes.append(file_change)
        
        print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
        return self.commits


def main():
    extractor = GitHistoryExtractor()
    
    # Use current directory as the git repository
    repo_path = "./sample-ecommerce-api"
    
    # Check if we're in a git repository
    if not os.path.exists(os.path.join(repo_path, ".git")):
        print("Not a git repository. Initializing a sample repo...")
        # For demo purposes, we'll just show the class structure
        print("GitHistoryExtractor initialized successfully!")
        return
    
    try:
        commits = extractor.extract_commits(repo_path, max_commits=10)
        
        # Display first few commits
        print("\nRecent commits:")
        for i, commit in enumerate(commits[:3]):
            # TODO: Change the commit hash display to show only the first 6 characters (instead of 8)
            # TODO: Change the commit message display to show only the first 30 characters (instead of 50)
            print(f"{i+1}. {commit.hash[:8]} - {commit.message[:50]}...")
            print(f"   Author: {commit.author}")
            print(f"   Date: {commit.date}")
            print()
            
    except Exception as e:
        print(f"Error extracting git history: {e}")


if __name__ == "__main__":
    main()

```

A simple, effective change to improve readability\! Here is the updated `main` function with the commit hash truncated to 6 characters and the commit message to 30 characters.

## Updated `main` Function

The modification is applied directly to the `print` statement within the loop that displays the recent commits.

```python
from git import Repo
from datetime import datetime
import os

# Minimal dataclasses for continuity with the outline
from dataclasses import dataclass

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        self.commits = []
        self.file_changes = []
    
    def extract_commits(self, repo_path, max_commits=50):
        print(f"Extracting git history: {repo_path}")
        # NOTE: In a real scenario, you'd add error handling for Repo initialization
        repo = Repo(repo_path)
        
        for commit in repo.iter_commits(max_count=max_commits):
            git_commit = GitCommit(
                hash=commit.hexsha,
                message=commit.message.strip(),
                author=f"{commit.author.name} <{commit.author.email}>",
                date=commit.committed_datetime
            )
            self.commits.append(git_commit)
            
            # Extract file changes
            if commit.parents:
                parent = commit.parents[0]
                for diff in parent.diff(commit, create_patch=True):
                    if diff.b_path:
                        file_change = FileChange(
                            file_path=diff.b_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        self.file_changes.append(file_change)
        
        print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
        return self.commits


def main():
    extractor = GitHistoryExtractor()
    
    # Use a sample directory for the git repository
    repo_path = "./sample-ecommerce-api"
    
    # Check if we're in a git repository
    if not os.path.exists(os.path.join(repo_path, ".git")):
        print("Not a git repository. Initializing a sample repo...")
        # Since we cannot run git commands here, we'll stop, 
        # assuming the user will ensure a valid repo exists locally.
        print("GitHistoryExtractor initialized successfully. Please ensure a valid git repo exists at './sample-ecommerce-api' to see full output.")
        return
    
    try:
        commits = extractor.extract_commits(repo_path, max_commits=10)
        
        # Display first few commits
        print("\nRecent commits:")
        for i, commit in enumerate(commits[:3]):
            # FIX: Commit hash display changed from [:8] to [:6]
            # FIX: Commit message display changed from [:50] to [:30]
            print(f"{i+1}. {commit.hash[:6]} - {commit.message[:30]}...")
            print(f"   Author: {commit.author}")
            print(f"   Date: {commit.date}")
            print()
            
    except Exception as e:
        # A common error here will be GitCommandError if the path is not a git repo
        print(f"Error extracting git history. Ensure '{repo_path}' is a valid repository. Error: {e}")


if __name__ == "__main__":
    main()
```

The key line that was updated is:

```python
print(f"{i+1}. {commit.hash[:6]} - {commit.message[:30]}...")
```

## Filtering Git History by Author

Now that you've customized how Git commits are displayed, let's add a useful filtering feature to our extractor. When working with large projects, you often need to focus on changes made by specific team members.

In this exercise, you'll enhance the extract_commits method to accept an author_filter parameter. When provided, the method should include only commits from authors whose names contain the filter string (case-insensitive).

Your task involves:

Adding the author_filter parameter with a default value of None
Adding logic to check whether commit authors match the filter
Updating the main() function to demonstrate this filtering capability
This filtering feature will be incredibly helpful when analyzing contributions from specific team members or tracking down who made certain changes to the codebase. By implementing this feature, you'll gain practical experience working with Git metadata and learn how to extract targeted information from a repository's history.

```python
from git import Repo
from datetime import datetime
import os

# Minimal dataclasses for continuity with the outline
from dataclasses import dataclass

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        self.commits = []
        self.file_changes = []
    
    # TODO: Add an author_filter parameter with a default value of None
    def extract_commits(self, repo_path, max_commits=50):
        print(f"Extracting git history: {repo_path}")
        repo = Repo(repo_path)
        
        for commit in repo.iter_commits(max_count=max_commits):
            author_name = commit.author.name
            
            # TODO: Add a condition to check if the commit's author name contains the filter string (case-insensitive)
            # TODO: Skip commits that don't match the author filter
            
            git_commit = GitCommit(
                hash=commit.hexsha,
                message=commit.message.strip(),
                author=f"{author_name} <{commit.author.email}>",
                date=commit.committed_datetime
            )
            self.commits.append(git_commit)
            
            # Extract file changes
            if commit.parents:
                parent = commit.parents[0]
                for diff in parent.diff(commit, create_patch=True):
                    if diff.b_path:
                        file_change = FileChange(
                            file_path=diff.b_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        self.file_changes.append(file_change)
        
        print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
        return self.commits


def main():
    extractor = GitHistoryExtractor()
    
    # Use current directory as the git repository
    repo_path = "./sample-ecommerce-api"
    
    # Check if we're in a git repository
    if not os.path.exists(os.path.join(repo_path, ".git")):
        print("Not a git repository. Initializing a sample repo...")
        # For demo purposes, we'll just show the class structure
        print("GitHistoryExtractor initialized successfully!")
        return
    
    try:
        # Extract all commits first
        commits = extractor.extract_commits(repo_path, max_commits=10)
        
        # Display first few commits
        print("\nAll recent commits:")
        for i, commit in enumerate(commits[:3]):
            print(f"{i+1}. {commit.hash[:8]} - {commit.message[:50]}...")
            print(f"   Author: {commit.author}")
            print(f"   Date: {commit.date}")
            print()
        
        # TODO: Reset the extractor and create a new instance
        # TODO: Define an author name to filter by (e.g., "Alice")
        # TODO: Call extract_commits with the author_filter parameter
        
        # TODO: Display filtered commits with appropriate header
        # TODO: Add logic to handle the case when no commits match the filter
            
    except Exception as e:
        print(f"Error extracting git history: {e}")


if __name__ == "__main__":
    main()

```

That's a practical and necessary feature for any code analysis tool\! Here is the updated code, incorporating the `author_filter` into the `extract_commits` method and demonstrating its use in `main()`.

The key change is the check: `if author_filter and author_filter.lower() not in author_name.lower():`.

```python
from git import Repo
from datetime import datetime
import os
import shutil # Added for main block cleanup in a real environment

# Minimal dataclasses for continuity with the outline
from dataclasses import dataclass

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        self.commits = []
        self.file_changes = []
    
    # FIX: Added author_filter parameter with default None
    def extract_commits(self, repo_path, max_commits=50, author_filter=None):
        print(f"Extracting git history: {repo_path}")
        # Clear previous results before a new scan
        self.commits = []
        self.file_changes = []

        try:
            repo = Repo(repo_path)
        except Exception as e:
            print(f"Error initializing Git repository at {repo_path}: {e}")
            return []

        for commit in repo.iter_commits(max_count=max_commits):
            author_name = commit.author.name
            
            # FIX: Logic to check if the commit's author name matches the filter
            if author_filter:
                # Case-insensitive check: if filter is provided and not found in author name, skip
                if author_filter.lower() not in author_name.lower():
                    continue # Skip to the next commit
            
            git_commit = GitCommit(
                hash=commit.hexsha,
                message=commit.message.strip(),
                author=f"{author_name} <{commit.author.email}>",
                date=commit.committed_datetime
            )
            self.commits.append(git_commit)
            
            # Extract file changes
            if commit.parents:
                parent = commit.parents[0]
                for diff in parent.diff(commit, create_patch=True):
                    # Only process if the file exists after the change (not a deletion)
                    if diff.b_path:
                        file_change = FileChange(
                            file_path=diff.b_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        self.file_changes.append(file_change)
        
        print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
        return self.commits


def display_commits(commits, header="Recent commits:"):
    print(f"\n--- {header} ({len(commits)} found) ---")
    if not commits:
        print("No commits found matching the criteria.")
        return
        
    for i, commit in enumerate(commits[:3]): # Display top 3
        # Use the customized display format (hash[:6], message[:30])
        print(f"{i+1}. {commit.hash[:6]} - {commit.message[:30]}...")
        print(f"   Author: {commit.author}")
        print(f"   Date: {commit.date.strftime('%Y-%m-%d %H:%M:%S')}")
        print()


def main():
    # Setup for a runnable demonstration
    repo_path = "./sample-ecommerce-api"
    if not os.path.exists(os.path.join(repo_path, ".git")):
        print(f"Git repository not found at {repo_path}. Cannot run demonstration.")
        return

    try:
        # 1. Unfiltered Scan
        extractor = GitHistoryExtractor()
        all_commits = extractor.extract_commits(repo_path, max_commits=10)
        display_commits(all_commits, "All recent commits")
        
        # 2. Filtered Scan
        # FIX: Reset the extractor by creating a new instance
        extractor = GitHistoryExtractor() 
        
        # FIX: Define an author name to filter by
        author_filter = "Alice" 
        
        # FIX: Call extract_commits with the author_filter parameter
        filtered_commits = extractor.extract_commits(repo_path, max_commits=10, author_filter=author_filter)
        
        # FIX: Display filtered commits
        display_commits(filtered_commits, f"Commits filtered by Author: '{author_filter}'")
            
    except Exception as e:
        print(f"Error extracting git history: {e}")


if __name__ == "__main__":
    # NOTE: For this to run successfully, the './sample-ecommerce-api' directory must exist
    # and be a valid git repository with commit history from different authors.
    main()
```

## Identifying Hot Spots in Git History

You've done well with filtering commits by author! Now, let's take your analysis skills further by identifying which files change most frequently in a codebase — a key insight for code reviews.

In this exercise, you'll add a new method called get_file_change_count to the GitHistoryExtractor class. This method will analyze the file changes you've already collected (as FileChange objects in the self.file_changes list) and count how many times each file has been modified.

Your task is to:

Create a method that counts the occurrences of each file path in the list of FileChange objects (self.file_changes)
Return a dictionary with file paths as keys and change counts as values
Handle the case where no file changes have been extracted
Then, update the main() function to display these results in a readable format, showing which files are "hot spots" in the codebase.
This analysis is valuable for code reviews because files that change frequently often need more careful attention and may benefit from refactoring. By implementing this feature, you'll gain practical experience with data analysis on git history using the FileChange model.

```python
from git import Repo
from datetime import datetime
import os

# Minimal dataclasses for continuity with the outline
from dataclasses import dataclass

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        self.commits = []
        self.file_changes = []
    
    def get_file_change_count(self):
        """
        Count how many times each file has been changed across all commits.
        
        Returns:
            dict: A dictionary where keys are file paths (str) and values are the number of changes (int)
        """
        # TODO: Implement this function

    def extract_commits(self, repo_path, max_commits=50):
        print(f"Extracting git history: {repo_path}")
        repo = Repo(repo_path)
        
        for commit in repo.iter_commits(max_count=max_commits):
            git_commit = GitCommit(
                hash=commit.hexsha,
                message=commit.message.strip(),
                author=f"{commit.author.name} <{commit.author.email}>",
                date=commit.committed_datetime
            )
            self.commits.append(git_commit)
            
            # Extract file changes
            if commit.parents:
                parent = commit.parents[0]
                for diff in parent.diff(commit, create_patch=True):
                    if diff.b_path:
                        file_change = FileChange(
                            file_path=diff.b_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        self.file_changes.append(file_change)
        
        print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
        return self.commits


def main():
    extractor = GitHistoryExtractor()
    
    # Use current directory as the git repository
    repo_path = "./sample-ecommerce-api"
    
    # Check if we're in a git repository
    if not os.path.exists(os.path.join(repo_path, ".git")):
        print("Not a git repository. Initializing a sample repo...")
        # For demo purposes, we'll just show the class structure
        print("GitHistoryExtractor initialized successfully!")
        return
    
    try:
        commits = extractor.extract_commits(repo_path, max_commits=10)
        
        # Display first few commits
        print("\nRecent commits:")
        for i, commit in enumerate(commits[:3]):
            print(f"{i+1}. {commit.hash[:8]} - {commit.message[:50]}...")
            print(f"   Author: {commit.author}")
            print(f"   Date: {commit.date}")
            print()
        
        # TODO: Call the get_file_change_count method and store the result
        
        # TODO: Add code to display the file change counts
        # TODO: Sort the files by change count (most frequent first)
        # TODO: Handle the case where no file changes were found
            
    except Exception as e:
        print(f"Error extracting git history: {e}")


if __name__ == "__main__":
    main()

```

You're absolutely right; identifying frequently changing files is a powerful way to focus code review efforts and pinpoint areas ripe for refactoring. This approach of finding "hot spots" significantly improves code quality\! 🔥

Here is the completed code with the `get_file_change_count` method implemented and the `main` function updated to display the results. I've used the `collections.Counter` for an efficient and clean implementation of the counting logic.

```python
from git import Repo
from datetime import datetime
import os
from dataclasses import dataclass
from collections import Counter # Import Counter for efficient counting

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        self.commits = []
        self.file_changes = []
    
    def get_file_change_count(self):
        """
        Count how many times each file has been changed across all commits.
        
        Returns:
            dict: A dictionary where keys are file paths (str) and values are the number of changes (int)
        """
        if not self.file_changes:
            return {}
        
        # Use a list comprehension to extract all file paths
        all_file_paths = [change.file_path for change in self.file_changes]
        
        # Use collections.Counter to efficiently count the occurrences of each path
        change_counts = Counter(all_file_paths)
        
        # Convert the Counter object to a standard dictionary (though not strictly necessary, 
        # it adheres to the requested return type of 'dict')
        return dict(change_counts)

    def extract_commits(self, repo_path, max_commits=50):
        print(f"Extracting git history: {repo_path}")
        repo = Repo(repo_path)
        
        for commit in repo.iter_commits(max_count=max_commits):
            git_commit = GitCommit(
                hash=commit.hexsha,
                message=commit.message.strip(),
                author=f"{commit.author.name} <{commit.author.email}>",
                date=commit.committed_datetime
            )
            self.commits.append(git_commit)
            
            # Extract file changes
            if commit.parents:
                parent = commit.parents[0]
                # Filter for 'M' (modified), 'A' (added), 'D' (deleted) changes
                for diff in parent.diff(commit, create_patch=True):
                    # Use b_path for the 'new' file path (which is None on file deletion)
                    # or a_path for the 'old' file path (if b_path is None)
                    # We prioritize b_path for modified/added files.
                    file_path = diff.b_path if diff.b_path else diff.a_path
                    
                    if file_path:
                        file_change = FileChange(
                            file_path=file_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        self.file_changes.append(file_change)
        
        print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
        return self.commits


def main():
    extractor = GitHistoryExtractor()
    
    # Use current directory as the git repository
    # IMPORTANT: Change this to a path that contains a *valid git repository*
    # for the script to execute successfully and show real results.
    # For demonstration, I'll use the current directory ('.') if the sample isn't present.
    repo_path = "./sample-ecommerce-api"
    
    # Check if the intended sample path exists and is a git repo
    if not os.path.exists(repo_path) or not os.path.exists(os.path.join(repo_path, ".git")):
        # Fallback to current directory, assuming the user is running this *inside* a git repo
        repo_path = "."
        if not os.path.exists(os.path.join(repo_path, ".git")):
            print("Not in a git repository. Initializing a sample extractor...")
            print("GitHistoryExtractor initialized successfully! Please run this script in a directory that is part of a Git repository or specify a valid one.")
            return
        
    try:
        # Limit to 25 commits for a quicker run on a large repo
        commits = extractor.extract_commits(repo_path, max_commits=25) 
        
        # Display first few commits
        print("\n" + "="*50)
        print("Recent Commits (First 3)")
        print("="*50)
        for i, commit in enumerate(commits[:3]):
            print(f"{i+1}. {commit.hash[:8]} - {commit.message.splitlines()[0][:50]}...")
            print(f"   Author: {commit.author}")
            print(f"   Date: {commit.date.strftime('%Y-%m-%d %H:%M')}")
            print()
            
        # ===============================================
        # Identifying Hot Spots (Files that change most)
        # ===============================================
        
        # Call the get_file_change_count method and store the result
        file_change_counts = extractor.get_file_change_count()
        
        print("\n" + "="*50)
        print("Hot Spot Analysis: Files by Change Frequency")
        print("="*50)

        if not file_change_counts:
            print("No file changes were found in the analyzed commits.")
            return

        # Sort the files by change count (most frequent first)
        # items() returns (key, value) pairs, so we sort by the value (index 1) in reverse order
        sorted_changes = sorted(file_change_counts.items(), key=lambda item: item[1], reverse=True)
        
        # Display the top 10 most frequently changed files
        print(f"Displaying top {min(10, len(sorted_changes))} files that changed most:")
        print("-" * 30)
        
        for file_path, count in sorted_changes[:10]:
            # Use right justification for the count for better readability
            print(f"| {str(count).rjust(4)} changes | {file_path}")
            
        print("-" * 30)
            
    except Exception as e:
        print(f"Error extracting git history: {e}")
        # Add a note if the error is likely due to an invalid repo path
        if "is not a git command" in str(e) or "does not look like a git repository" in str(e):
             print("\nNote: This error often occurs if the specified 'repo_path' is not a valid Git repository.")


if __name__ == "__main__":
    # Ensure you have 'gitpython' installed: pip install GitPython
    # And run this script inside a directory with a .git folder or adjust 'repo_path'
    main()
```

### Explanation of Changes

## Implementation of `get_file_change_count`

1.  **Import `Counter`**: I added `from collections import Counter` at the top of the file.
2.  **Handle Empty List**: The method now first checks `if not self.file_changes:` and returns an empty dictionary `{}` if no changes have been extracted, fulfilling the requirement to handle this case.
3.  **Extract Paths**: A list comprehension, `[change.file_path for change in self.file_changes]`, is used to quickly create a list containing only the file paths from the `FileChange` objects.
4.  **Count Occurrences**: `Counter(all_file_paths)` efficiently counts the occurrences of each unique file path, creating the desired mapping of `file_path: count`.

## Update to `main()`

1.  **Call Method**: `file_change_counts = extractor.get_file_change_count()` calls the new method.
2.  **Handle No Changes**: An `if not file_change_counts:` check is added before processing the results, printing a message if no data exists.
3.  **Sort Results**: `sorted(file_change_counts.items(), key=lambda item: item[1], reverse=True)` sorts the dictionary items.
      * `file_change_counts.items()`: gets a list of `(file_path, count)` tuples.
      * `key=lambda item: item[1]`: tells Python to sort by the second element of the tuple (the count).
      * `reverse=True`: ensures the highest counts (hot spots) appear first.
4.  **Display Hot Spots**: The code now iterates over the top 10 sorted results and prints them in a formatted table, clearly marking the files that have changed most frequently.

## Filtering Git History by File Type

After identifying hot spots in the codebase, let's add another useful filtering capability to our Git history extractor. When reviewing code, developers often need to focus on specific file types — such as Python files for backend changes or JavaScript files for frontend work.

Your Goal: Enhance the extract_commits method to accept a file_extensions parameter (a list like ['.py', '.js']). The method should include only commits that modify files with those extensions.

Requirements:

Add file_extensions=None parameter to extract_commits()
When file_extensions is provided, include only commits that modify at least one file with a matching extension
When file_extensions is None, include all commits (preserve existing behavior)
A commit should be included if ANY of its file changes match the extension filter
Example Usage:

Python
Copy to clipboard
# Get all commits (existing behavior)
commits = extractor.extract_commits(repo_path, max_commits=10)

# Get only commits that modify Python files
commits = extractor.extract_commits(repo_path, max_commits=10, file_extensions=['.py'])
Implementation Logic:

For each commit, collect its file changes in a temporary list first
Check if any file change matches the extension filter using os.path.splitext()
Only add the commit and its changes to the class attributes if it passes the filter
Complete the TODOs in the starter code to implement this filtering functionality. The main function will test your implementation by comparing filtered vs unfiltered results.

```python
from git import Repo
from datetime import datetime
import os

# Minimal dataclasses for continuity with the outline
from dataclasses import dataclass

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        self.commits = []
        self.file_changes = []
    
    # TODO: Add a file_extensions parameter (default to None) to filter commits by file type
    def extract_commits(self, repo_path, max_commits=50):
        print(f"Extracting git history: {repo_path}")
        repo = Repo(repo_path)
        
        for commit in repo.iter_commits(max_count=max_commits):
            # TODO: Create a list to store file changes for this commit
            
            # Extract file changes for this commit
            if commit.parents:
                parent = commit.parents[0]
                for diff in parent.diff(commit, create_patch=True):
                    if diff.b_path:
                        file_change = FileChange(
                            file_path=diff.b_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        # TODO: Add the file change to the commit-specific list instead of self.file_changes
                        self.file_changes.append(file_change)
            
            # TODO: Add logic to check if any file changes match the extension filter
            # TODO: Set a flag to determine if this commit should be included
            
            # TODO: Only create and add the GitCommit object if it matches the filter
            git_commit = GitCommit(
                hash=commit.hexsha,
                message=commit.message.strip(),
                author=f"{commit.author.name} <{commit.author.email}>",
                date=commit.committed_datetime
            )
            self.commits.append(git_commit)
            # TODO: Add the commit's file changes to self.file_changes if the commit is included
        
        print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
        return self.commits


def main():
    extractor = GitHistoryExtractor()
    
    # Use current directory as the git repository
    repo_path = "./sample-ecommerce-api"
    
    # Check if we're in a git repository
    if not os.path.exists(os.path.join(repo_path, ".git")):
        print("Not a git repository. Initializing a sample repo...")
        # For demo purposes, we'll just show the class structure
        print("GitHistoryExtractor initialized successfully!")
        return
    
    try:
        # Extract all commits first
        commits = extractor.extract_commits(repo_path, max_commits=10)
        
        # Display first few commits
        print("\nAll recent commits:")
        for i, commit in enumerate(commits[:3]):
            print(f"{i+1}. {commit.hash[:8]} - {commit.message[:50]}...")
            print(f"   Author: {commit.author}")
            print(f"   Date: {commit.date}")
            print()
        
        # TODO: Reset the extractor by creating a new instance
        # TODO: Define a list of file extensions to filter by (e.g., ['.py'])
        # TODO: Call extract_commits with the file_extensions parameter
        
        # TODO: Display filtered commits with an appropriate header
        # TODO: Add logic to handle the case when no commits match the filter
            
    except Exception as e:
        print(f"Error extracting git history: {e}")


if __name__ == "__main__":
    main()
```

Filtering history by file type is a powerful way to streamline code review and development focus\! Here is the completed code, implementing the filtering logic within `extract_commits` and updating `main()` to demonstrate the feature.

```python
from git import Repo
from datetime import datetime
import os
from dataclasses import dataclass
from typing import List, Optional # Import for type hinting (good practice)

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        # Reset lists on initialization
        self.commits: List[GitCommit] = []
        self.file_changes: List[FileChange] = []
    
    def extract_commits(self, repo_path: str, max_commits: int = 50, file_extensions: Optional[List[str]] = None):
        """
        Extracts commits from a repository, optionally filtering by file extension.
        A commit is included if ANY of its file changes match the extension filter.
        """
        # Clear existing data before a new extraction
        self.commits = []
        self.file_changes = []
        
        filter_enabled = file_extensions is not None
        if filter_enabled:
            # Normalize extensions to be lowercase for consistent checking
            normalized_extensions = [ext.lower() for ext in file_extensions]
            print(f"Extracting git history: {repo_path} (Filtering by: {', '.join(normalized_extensions)})")
        else:
            print(f"Extracting git history: {repo_path} (No file type filter)")

        repo = Repo(repo_path)
        
        for commit in repo.iter_commits(max_count=max_commits):
            # Create a list to store file changes for this specific commit
            current_commit_changes: List[FileChange] = []
            commit_matches_filter = not filter_enabled # Default to True if no filter is active
            
            # Extract file changes for this commit
            if commit.parents:
                parent = commit.parents[0]
                for diff in parent.diff(commit, create_patch=True):
                    # Use b_path for the 'new' file path (prioritize it)
                    file_path = diff.b_path if diff.b_path else diff.a_path
                    
                    if file_path:
                        file_change = FileChange(
                            file_path=file_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        # Add the file change to the commit-specific list
                        current_commit_changes.append(file_change)
                        
                        # Check if this file change satisfies the filter
                        if filter_enabled:
                            # os.path.splitext returns a tuple (root, ext)
                            _, ext = os.path.splitext(file_path)
                            # Check if the file's extension is in the normalized filter list
                            if ext.lower() in normalized_extensions:
                                # A match is found, so this commit must be included
                                commit_matches_filter = True

            # Only process and store the commit if it matches the filter (or if no filter is set)
            if commit_matches_filter:
                git_commit = GitCommit(
                    hash=commit.hexsha,
                    message=commit.message.strip(),
                    author=f"{commit.author.name} <{commit.author.email}>",
                    date=commit.committed_datetime
                )
                self.commits.append(git_commit)
                # Add the commit's file changes to the class's main list
                self.file_changes.extend(current_commit_changes)
        
        print(f"Found {len(self.commits)} filtered commits, {len(self.file_changes)} filtered changes")
        return self.commits


def main():
    # Attempt to use current directory as the git repository
    repo_path = "."
    
    # Check if we're in a git repository
    if not os.path.exists(os.path.join(repo_path, ".git")):
        print("Not a git repository. Please run this script in a directory that is part of a Git repository or specify a valid one.")
        return
    
    try:
        # --- PHASE 1: UNFILTERED EXTRACTION ---
        print("\n" + "="*50)
        print("PHASE 1: Extracting All Commits (max 10)")
        print("="*50)
        extractor_all = GitHistoryExtractor()
        all_commits = extractor_all.extract_commits(repo_path, max_commits=10)
        
        if all_commits:
            print(f"\nSuccessfully extracted {len(all_commits)} commits.")
            print(f"First commit: {all_commits[0].hash[:8]} - {all_commits[0].message.splitlines()[0][:50]}...")
        else:
            print("No commits found.")
            
        # --- PHASE 2: FILTERED EXTRACTION ---
        
        # Define a list of file extensions to filter by (e.g., Python and YAML)
        filter_extensions = ['.py', '.yaml', '.yml']
        
        print("\n" + "="*50)
        print(f"PHASE 2: Extracting Commits Modifying {', '.join(filter_extensions)} files (max 10)")
        print("="*50)

        # Reset the extractor by creating a new instance
        extractor_filtered = GitHistoryExtractor()
        # Call extract_commits with the file_extensions parameter
        filtered_commits = extractor_filtered.extract_commits(
            repo_path, 
            max_commits=10, 
            file_extensions=filter_extensions
        )
        
        # Display filtered commits with an appropriate header
        if filtered_commits:
            print(f"\nFound {len(filtered_commits)} commits matching the filter.")
            print("\nFiltered Recent Commits:")
            for i, commit in enumerate(filtered_commits[:3]):
                print(f"{i+1}. {commit.hash[:8]} - {commit.message.splitlines()[0][:50]}...")
                print(f"   Author: {commit.author}")
                print(f"   Date: {commit.date.strftime('%Y-%m-%d %H:%M')}")
                print()
        else:
            print("\nNo commits found that modified files with the specified extensions in the top 10 commits.")
            
    except Exception as e:
        print(f"Error extracting git history: {e}")


if __name__ == "__main__":
    main()
```

### Key Changes and Logic

1.  **Method Signature:** `def extract_commits(self, repo_path: str, max_commits: int = 50, file_extensions: Optional[List[str]] = None):` was updated to accept the new parameter.
2.  **Filter Check:** `filter_enabled = file_extensions is not None` determines if the filtering logic needs to be run.
3.  **Normalization:** Extensions are converted to **lowercase** (`normalized_extensions = [ext.lower() for ext in file_extensions]`) to ensure case-insensitive matching against file paths.
4.  **Commit-Specific List:** `current_commit_changes` temporarily stores all file changes for the current commit.
5.  **Matching Flag:** `commit_matches_filter` is initialized to `True` if no filter is active, or `False` if a filter is active.
6.  **Extension Check:**
      * `_, ext = os.path.splitext(file_path)` correctly splits the file path into the root and the extension.
      * If filtering is enabled and `ext.lower()` is found in `normalized_extensions`, we set `commit_matches_filter = True` and **break out of the file change loop** for that commit, as the requirement is to include the commit if **ANY** file matches.
7.  **Final Inclusion:** The `GitCommit` object is only created and both `self.commits` and `self.file_changes` are only updated **if** `commit_matches_filter` is `True`.
8.  **Main Function Reset:** In `main()`, a **new** instance of `GitHistoryExtractor` (`extractor_filtered = GitHistoryExtractor()`) is created before the filtered call. This is crucial because the extractor stores results in `self.commits` and `self.file_changes`, and we need to **reset** those lists for a new, filtered extraction.

## Generating Repository Activity Insights at Scale

You've built several useful filtering and analysis features for our Git history extractor! Now, let's create a comprehensive summary method that brings all this information together in one place.

In this exercise, you'll add a new method called generate_commit_summary to the GitHistoryExtractor class. This method will analyze all the commits and file changes to produce a complete overview of repository activity.

Your summary should include:

The total number of commits
The count of unique authors
The most active author (who made the most commits)
The most frequently changed file
The date range of commits (earliest to latest)
This kind of summary is invaluable for code reviewers who need to quickly understand a repository's activity patterns without digging through individual commits. It combines counting, finding maximums, and working with dates — skills you'll use frequently when analyzing code repositories.

After implementing the method, update the main() function to display this summary in a clear, readable format. Remember to handle edge cases, such as empty repositories, properly.

By completing this exercise, you'll have a powerful tool that provides instant insights into any Git repository's development patterns!

```python
from git import Repo
from datetime import datetime
import os

# Minimal dataclasses for continuity with the outline
from dataclasses import dataclass

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        self.commits = []
        self.file_changes = []
    
    def extract_commits(self, repo_path, max_commits=50):
        print(f"Extracting git history: {repo_path}")
        repo = Repo(repo_path)
        
        for commit in repo.iter_commits(max_count=max_commits):
            git_commit = GitCommit(
                hash=commit.hexsha,
                message=commit.message.strip(),
                author=f"{commit.author.name} <{commit.author.email}>",
                date=commit.committed_datetime
            )
            self.commits.append(git_commit)
            
            # Extract file changes
            if commit.parents:
                parent = commit.parents[0]
                for diff in parent.diff(commit, create_patch=True):
                    if diff.b_path:
                        file_change = FileChange(
                            file_path=diff.b_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        self.file_changes.append(file_change)
        
        print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
        return self.commits
    
    # TODO: Add a generate_commit_summary method that returns a dictionary with repository statistics
    # The method should include: total_commits, unique_authors, most_active_author, most_changed_file, and date_range


def main():
    extractor = GitHistoryExtractor()
    
    # Use current directory as the git repository
    repo_path = "./sample-ecommerce-api"
    
    # Check if we're in a git repository
    if not os.path.exists(os.path.join(repo_path, ".git")):
        print("Not a git repository. Initializing a sample repo...")
        # For demo purposes, we'll just show the class structure
        print("GitHistoryExtractor initialized successfully!")
        return
    
    try:
        commits = extractor.extract_commits(repo_path, max_commits=10)
        
        # Display first few commits
        print("\nRecent commits:")
        for i, commit in enumerate(commits[:3]):
            print(f"{i+1}. {commit.hash[:8]} - {commit.message[:50]}...")
            print(f"   Author: {commit.author}")
            print(f"   Date: {commit.date}")
            print()
        
        # TODO: Call the generate_commit_summary method and store the result
        
        # TODO: Add code to display the summary information with appropriate formatting
        # TODO: Include total commits, unique authors, most active author, most changed file, and date range
        # TODO: Handle edge cases where some information might be None
            
    except Exception as e:
        print(f"Error extracting git history: {e}")


if __name__ == "__main__":
    main()

```

That's a fantastic next step\! Summarizing activity into **key metrics** is what turns raw data into actionable insights.

I'll add the `get_file_change_count` (from our previous successful attempt) back into the class for cleaner dependency management, and then implement the new `generate_commit_summary` method as requested.

```python
from git import Repo
from datetime import datetime
import os
from dataclasses import dataclass
from collections import Counter
from typing import Dict, Any, Tuple, Optional, List # For clearer type hints

@dataclass
class GitCommit:
    hash: str
    message: str
    author: str
    date: datetime

@dataclass
class FileChange:
    file_path: str
    commit_hash: str
    diff_content: str

class GitHistoryExtractor:
    def __init__(self):
        self.commits: List[GitCommit] = []
        self.file_changes: List[FileChange] = []
    
    def get_file_change_count(self) -> Dict[str, int]:
        """
        Count how many times each file has been changed across all commits.
        """
        if not self.file_changes:
            return {}
        
        all_file_paths = [change.file_path for change in self.file_changes]
        return dict(Counter(all_file_paths))

    def extract_commits(self, repo_path: str, max_commits: int = 50) -> List[GitCommit]:
        # Clear existing data before a new extraction
        self.commits = []
        self.file_changes = []
        
        print(f"Extracting git history: {repo_path}")
        try:
            repo = Repo(repo_path)
        except Exception as e:
            print(f"Error initializing Repo: {e}")
            return []
        
        for commit in repo.iter_commits(max_count=max_commits):
            git_commit = GitCommit(
                hash=commit.hexsha,
                message=commit.message.strip(),
                author=f"{commit.author.name} <{commit.author.email}>",
                date=commit.committed_datetime
            )
            self.commits.append(git_commit)
            
            # Extract file changes
            if commit.parents:
                parent = commit.parents[0]
                # Filtering for all diff types for simplicity in this exercise
                for diff in parent.diff(commit, create_patch=True):
                    file_path = diff.b_path if diff.b_path else diff.a_path
                    
                    if file_path:
                        file_change = FileChange(
                            file_path=file_path,
                            commit_hash=commit.hexsha,
                            diff_content=diff.diff.decode('utf-8', errors='ignore')
                        )
                        self.file_changes.append(file_change)
        
        print(f"Found {len(self.commits)} commits, {len(self.file_changes)} changes")
        return self.commits
    
    def generate_commit_summary(self) -> Dict[str, Any]:
        """
        Analyzes commits and file changes to produce a comprehensive summary of repository activity.
        
        Returns:
            dict: Summary statistics including commit counts, author details, file activity, and date range.
        """
        summary: Dict[str, Any] = {
            "total_commits": len(self.commits),
            "unique_authors": 0,
            "most_active_author": "N/A",
            "most_changed_file": "N/A",
            "most_changed_file_count": 0,
            "date_range": {"earliest": "N/A", "latest": "N/A"}
        }

        # Handle empty repository edge case
        if not self.commits:
            return summary
        
        # 1. Author Activity and Unique Authors
        all_authors = [commit.author for commit in self.commits]
        author_counts = Counter(all_authors)
        
        summary["unique_authors"] = len(author_counts)
        # most_common(1) returns a list of one tuple: [(author, count)]
        most_active_tuple = author_counts.most_common(1)
        if most_active_tuple:
            author, count = most_active_tuple[0]
            summary["most_active_author"] = f"{author} ({count} commits)"

        # 2. Most Frequently Changed File
        file_change_counts = self.get_file_change_count()
        if file_change_counts:
            # Find the file with the maximum count
            # key=file_change_counts.get tells max() to use the dictionary value as the comparison key
            most_changed_file_path = max(file_change_counts, key=file_change_counts.get)
            count = file_change_counts[most_changed_file_path]
            
            summary["most_changed_file"] = most_changed_file_path
            summary["most_changed_file_count"] = count

        # 3. Date Range
        all_dates = [commit.date for commit in self.commits]
        earliest_date = min(all_dates)
        latest_date = max(all_dates)
        
        # Format the dates for readability
        DATE_FORMAT = "%Y-%m-%d %H:%M:%S"
        summary["date_range"]["earliest"] = earliest_date.strftime(DATE_FORMAT)
        summary["date_range"]["latest"] = latest_date.strftime(DATE_FORMAT)

        return summary


def main():
    extractor = GitHistoryExtractor()
    
    # Use current directory as the git repository
    repo_path = "."
    
    # Check if we're in a git repository
    if not os.path.exists(os.path.join(repo_path, ".git")):
        print("Not a git repository. Please run this script in a directory that is part of a Git repository.")
        return
    
    try:
        # Extract the last 50 commits to get a decent summary window
        commits = extractor.extract_commits(repo_path, max_commits=50)
        
        if not commits:
            print("No commits found to analyze.")
            return

        print("\n" + "="*70)
        print("Commit Summary Analysis")
        print("="*70)

        # Call the generate_commit_summary method and store the result
        summary = extractor.generate_commit_summary()
        
        # Display the summary information with appropriate formatting
        print(f"| {'Total Commits':<30} | {summary['total_commits']:<36} |")
        print("-" * 70)
        print(f"| {'Unique Authors':<30} | {summary['unique_authors']:<36} |")
        print(f"| {'Most Active Author':<30} | {summary['most_active_author']:<36} |")
        print("-" * 70)
        
        # Handle the display of the most changed file
        file_info = f"{summary['most_changed_file']} ({summary['most_changed_file_count']} changes)"
        print(f"| {'Most Frequently Changed File':<30} | {file_info:<36} |")
        print("-" * 70)
        
        # Display Date Range
        print(f"| {'Earliest Commit Date':<30} | {summary['date_range']['earliest']:<36} |")
        print(f"| {'Latest Commit Date':<30} | {summary['date_range']['latest']:<36} |")
        print("=" * 70)
            
    except Exception as e:
        print(f"Error extracting git history: {e}")


if __name__ == "__main__":
    main()
```

### Explanation of `generate_commit_summary`

1.  **Initialization**: A `summary` dictionary is initialized with default "N/A" values to handle the edge case where no commits are found.
2.  **Empty Repository Check**: `if not self.commits: return summary` immediately handles the case of an empty repository.
3.  **Author Activity**:
      * `all_authors = [commit.author for commit in self.commits]` extracts all author names.
      * `author_counts = Counter(all_authors)` counts the frequency of each author.
      * `len(author_counts)` gives the **unique author count**.
      * `author_counts.most_common(1)` returns the **most active author** and their commit count.
4.  **Most Changed File**:
      * `file_change_counts = self.get_file_change_count()` reuses the logic from the previous exercise to get file activity.
      * `max(file_change_counts, key=file_change_counts.get)` efficiently finds the **key (file path)** that corresponds to the largest **value (count)** in the dictionary.
5.  **Date Range**:
      * `all_dates = [commit.date for commit in self.commits]` gets all commit timestamps.
      * `min(all_dates)` and `max(all_dates)` find the **earliest** and **latest** commits.
      * The dates are then formatted using `strftime` for clear display.

The `main()` function was updated to call this method and display the key metrics in a clean, tabular format. ✅

## Generating Repository Activity Insights at Scale