# Focused Learning: Code Review Automation Pipeline
## Deep Dive into the Three Core Tasks

### Learning Objectives:
- Master the three-stage code review automation pipeline
- Implement each task with practical examples
- Understand the data flow and task dependencies
- Build an end-to-end code review system

### Paper References:
- **Section II.A**: Automation in Code Review (Page 2)
- **Section III.E**: Code Review Automation Tasks (Page 4)
- **Figure 1**: The cycle of the code review process
- **Table I**: Summary of code review automation tasks

## 1. Understanding the Code Review Cycle

According to the paper, modern code review follows a cycle with three key steps:

1. **Review Necessity Prediction** (Reviewer): Determine if a code change needs review
2. **Review Comment Generation** (Reviewer): Generate constructive feedback
3. **Code Refinement** (Committer): Improve code based on feedback

This cycle repeats until reviewer and committer reach agreement.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass
from enum import Enum
import re
from IPython.display import display, HTML

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Visualize the code review cycle
def visualize_code_review_cycle():
    fig, ax = plt.subplots(figsize=(10, 8))
    
    # Define positions for cycle elements
    positions = {
        'PR': (0.5, 0.9),
        'RNP': (0.15, 0.5),
        'RCG': (0.5, 0.1),
        'CR': (0.85, 0.5),
        'Merge': (0.5, 0.5)
    }
    
    # Draw nodes
    node_colors = ['lightblue', 'lightgreen', 'lightcoral', 'lightyellow', 'lightgray']
    labels = ['Pull Request', 'Review\nNecessity?', 'Generate\nComment', 'Refine\nCode', 'Merge/Reject']
    
    for (key, pos), color, label in zip(positions.items(), node_colors, labels):
        circle = plt.Circle(pos, 0.12, color=color, ec='black', linewidth=2)
        ax.add_patch(circle)
        ax.text(pos[0], pos[1], label, ha='center', va='center', fontsize=10, fontweight='bold')
    
    # Draw arrows
    arrows = [
        ('PR', 'RNP', 'Submit'),
        ('RNP', 'RCG', 'Yes'),
        ('RNP', 'Merge', 'No'),
        ('RCG', 'CR', 'Comment'),
        ('CR', 'RNP', 'Updated'),
        ('Merge', 'PR', 'Next PR')
    ]
    
    for start, end, label in arrows:
        start_pos = positions[start]
        end_pos = positions[end]
        
        # Calculate arrow position
        dx = end_pos[0] - start_pos[0]
        dy = end_pos[1] - start_pos[1]
        
        ax.annotate('', xy=end_pos, xytext=start_pos,
                    arrowprops=dict(arrowstyle='->', lw=2, color='darkblue'),
                    )
        
        # Add label
        mid_x = (start_pos[0] + end_pos[0]) / 2
        mid_y = (start_pos[1] + end_pos[1]) / 2
        ax.text(mid_x, mid_y, label, fontsize=9, ha='center', 
                bbox=dict(boxstyle="round,pad=0.3", facecolor='white', alpha=0.8))
    
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
    ax.set_aspect('equal')
    ax.axis('off')
    ax.set_title('Code Review Automation Cycle\n(Based on Figure 1 from the paper)', fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()

visualize_code_review_cycle()

## 2. Task 1: Review Necessity Prediction (RNP)

**Goal**: Determine if a code diff requires review (binary classification)

**Input**: Diff hunk (code changes)

**Output**: Yes/No decision

In [None]:
@dataclass
class DiffHunk:
    """Represents a code diff hunk"""
    file_path: str
    old_lines: List[str]
    new_lines: List[str]
    line_numbers: Tuple[int, int]  # (start, end)
    
    def to_unified_diff(self) -> str:
        """Convert to unified diff format"""
        diff_lines = []
        diff_lines.append(f"--- {self.file_path}")
        diff_lines.append(f"+++ {self.file_path}")
        diff_lines.append(f"@@ -{self.line_numbers[0]},{len(self.old_lines)} +{self.line_numbers[0]},{len(self.new_lines)} @@")
        
        for line in self.old_lines:
            diff_lines.append(f"- {line}")
        for line in self.new_lines:
            diff_lines.append(f"+ {line}")
        
        return "\n".join(diff_lines)

class ReviewNecessityPredictor:
    """Predicts if a code diff needs review"""
    
    def __init__(self):
        # Define patterns that typically need review
        self.review_patterns = [
            r'TODO|FIXME|XXX|HACK',  # Comments needing attention
            r'console\.log|print\(|debug',  # Debug statements
            r'password|secret|key|token',  # Security concerns
            r'\b(rm|delete|drop)\b',  # Destructive operations
            r'catch\s*\(.*\)\s*\{\s*\}',  # Empty catch blocks
            r'//.*@',  # Commented out code with annotations
        ]
        
        # Patterns that typically don't need review
        self.safe_patterns = [
            r'^\s*//.*$',  # Comment-only changes
            r'^\s*$',  # Whitespace only
            r'\.(md|txt|json)$',  # Documentation files
        ]
    
    def predict(self, diff_hunk: DiffHunk) -> Tuple[bool, float, str]:
        """Predict if review is needed
        Returns: (needs_review, confidence, reason)
        """
        diff_text = diff_hunk.to_unified_diff()
        
        # Check for review patterns
        for pattern in self.review_patterns:
            if re.search(pattern, diff_text, re.IGNORECASE):
                return True, 0.9, f"Found pattern: {pattern}"
        
        # Check if it's a safe change
        all_safe = True
        for line in diff_hunk.new_lines:
            is_safe = any(re.match(pattern, line) for pattern in self.safe_patterns)
            if not is_safe and line.strip():
                all_safe = False
                break
        
        if all_safe:
            return False, 0.8, "Only safe changes detected"
        
        # Check complexity heuristics
        lines_changed = len(diff_hunk.old_lines) + len(diff_hunk.new_lines)
        if lines_changed > 50:
            return True, 0.7, "Large change size"
        
        # Default: small changes might need review
        return True, 0.6, "Standard code change"

# Example usage
example_diffs = [
    DiffHunk(
        file_path="src/auth.py",
        old_lines=["def login(username, pwd):", "    return authenticate(username, pwd)"],
        new_lines=["def login(username, password):", "    # TODO: add rate limiting", "    return authenticate(username, password)"],
        line_numbers=(10, 12)
    ),
    DiffHunk(
        file_path="README.md",
        old_lines=["## Installation"],
        new_lines=["## Installation", "", "### Prerequisites"],
        line_numbers=(5, 6)
    ),
    DiffHunk(
        file_path="src/utils.js",
        old_lines=["function process(data) {", "    return data;", "}"],
        new_lines=["function process(data) {", "    console.log('Processing:', data);", "    return data;", "}"],
        line_numbers=(20, 23)
    )
]

predictor = ReviewNecessityPredictor()

print("Review Necessity Predictions:\n")
for i, diff in enumerate(example_diffs):
    needs_review, confidence, reason = predictor.predict(diff)
    print(f"Diff {i+1} ({diff.file_path}):")
    print(f"  Needs Review: {'Yes' if needs_review else 'No'}")
    print(f"  Confidence: {confidence:.2f}")
    print(f"  Reason: {reason}")
    print(f"  Diff Preview:\n{diff.to_unified_diff()[:200]}...\n")

## 3. Task 2: Review Comment Generation (RCG)

**Goal**: Generate constructive review comments for code

**Input**: Code snippet or diff hunk

**Output**: Natural language comment

The paper considers two perspectives:
- **Line-level**: Comments on specific lines (CRer dataset)
- **Method-level**: Holistic view of the code (Tufano dataset)

In [None]:
class ReviewCommentGenerator:
    """Generates code review comments"""
    
    def __init__(self):
        # Comment templates based on common patterns
        self.comment_templates = {
            'null_check': "Consider adding null/undefined check for '{variable}' before {action}",
            'error_handling': "Missing error handling for {operation}. Consider wrapping in try-catch",
            'naming': "Variable name '{name}' could be more descriptive. Consider '{suggestion}'",
            'complexity': "This {construct} has high complexity. Consider breaking it into smaller functions",
            'security': "Potential security issue: {issue}. Please validate/sanitize {data}",
            'performance': "This operation might be inefficient for large {data_structure}. Consider {optimization}",
            'documentation': "Please add documentation explaining {what}",
            'magic_number': "Magic number {number} should be extracted to a named constant",
            'duplication': "This code appears to duplicate logic from {location}. Consider extracting to a shared function"
        }
    
    def analyze_code(self, code: str) -> List[Dict[str, str]]:
        """Analyze code and identify issues"""
        issues = []
        lines = code.split('\n')
        
        for i, line in enumerate(lines):
            line_num = i + 1
            
            # Check for null checks
            if re.search(r'\.(\w+)\(', line) and 'if' not in line:
                match = re.search(r'(\w+)\.(\w+)\(', line)
                if match:
                    issues.append({
                        'line': line_num,
                        'type': 'null_check',
                        'params': {'variable': match.group(1), 'action': f"calling {match.group(2)}()"}
                    })
            
            # Check for error handling
            if re.search(r'(fetch|request|save|delete|api)', line, re.IGNORECASE) and 'try' not in code:
                issues.append({
                    'line': line_num,
                    'type': 'error_handling',
                    'params': {'operation': 'async operation'}
                })
            
            # Check for magic numbers
            if re.search(r'[^\d]+(\d{2,})[^\d]+', line) and not re.search(r'(\d+\.\d+|0x|import)', line):
                match = re.search(r'[^\d]+(\d{2,})[^\d]+', line)
                if match:
                    issues.append({
                        'line': line_num,
                        'type': 'magic_number',
                        'params': {'number': match.group(1)}
                    })
            
            # Check for poor naming
            if re.search(r'\b[a-z]\b\s*=', line):  # Single letter variables
                match = re.search(r'\b([a-z])\b\s*=', line)
                if match and match.group(1) not in ['i', 'j', 'k']:  # Allow loop counters
                    issues.append({
                        'line': line_num,
                        'type': 'naming',
                        'params': {'name': match.group(1), 'suggestion': 'a more descriptive name'}
                    })
        
        return issues
    
    def generate_comment(self, code: str, perspective: str = 'line') -> List[str]:
        """Generate review comments"""
        issues = self.analyze_code(code)
        comments = []
        
        if perspective == 'line':
            # Generate line-level comments
            for issue in issues:
                template = self.comment_templates[issue['type']]
                comment = template.format(**issue['params'])
                comments.append(f"Line {issue['line']}: {comment}")
        else:
            # Generate method-level comment
            if issues:
                summary = f"Found {len(issues)} potential improvements:\n"
                for i, issue in enumerate(issues[:3]):  # Top 3 issues
                    template = self.comment_templates[issue['type']]
                    comment = template.format(**issue['params'])
                    summary += f"{i+1}. {comment}\n"
                comments.append(summary.strip())
            else:
                comments.append("Code looks good! Consider adding unit tests if not already present.")
        
        return comments

# Example code snippets
code_examples = [
    # Example 1: Missing null check
    """function processUser(user) {
    user.save();
    return user.id;
}""",
    
    # Example 2: Magic numbers and poor naming
    """def calculate(x, y):
    z = x * 1000 + y * 50;
    return z;
}""",
    
    # Example 3: Missing error handling
    """async function fetchData(url) {
    const response = await fetch(url);
    const data = await response.json();
    return data;
}"""
]

generator = ReviewCommentGenerator()

print("Generated Review Comments:\n")
for i, code in enumerate(code_examples):
    print(f"\n{'='*60}")
    print(f"Code Example {i+1}:")
    print(f"{'='*60}")
    print(code)
    print(f"\n--- Line-level Comments ---")
    line_comments = generator.generate_comment(code, perspective='line')
    for comment in line_comments:
        print(f"• {comment}")
    print(f"\n--- Method-level Comment ---")
    method_comments = generator.generate_comment(code, perspective='method')
    for comment in method_comments:
        print(comment)

## 4. Task 3: Code Refinement (CR)

**Goal**: Automatically refine code based on review comments

**Input**: Source code + review comment

**Output**: Refined code

The Tufano dataset uses `<START>` and `<END>` markers to indicate focus areas.

In [None]:
class CodeRefiner:
    """Refines code based on review comments"""
    
    def __init__(self):
        # Refinement strategies based on comment patterns
        self.refinement_strategies = {
            'null_check': self._add_null_check,
            'error_handling': self._add_error_handling,
            'naming': self._improve_naming,
            'extract_constant': self._extract_constant,
            'type_hints': self._add_type_hints
        }
    
    def _identify_refinement_type(self, comment: str) -> str:
        """Identify what type of refinement is needed"""
        comment_lower = comment.lower()
        
        if 'null' in comment_lower or 'undefined' in comment_lower:
            return 'null_check'
        elif 'error' in comment_lower or 'exception' in comment_lower or 'try' in comment_lower:
            return 'error_handling'
        elif 'name' in comment_lower or 'descriptive' in comment_lower:
            return 'naming'
        elif 'constant' in comment_lower or 'magic number' in comment_lower:
            return 'extract_constant'
        elif 'type' in comment_lower or 'hint' in comment_lower:
            return 'type_hints'
        
        return 'unknown'
    
    def _add_null_check(self, code: str, comment: str) -> str:
        """Add null/undefined checks"""
        lines = code.split('\n')
        refined_lines = []
        
        for line in lines:
            # Check if line contains method call on object
            match = re.search(r'(\s*)(\w+)\.(\w+)\(', line)
            if match and 'if' not in line:
                indent = match.group(1)
                obj = match.group(2)
                # Add null check before the line
                if 'function' in code or 'def' in code:
                    # JavaScript/Python style
                    refined_lines.append(f"{indent}if ({obj}) {{")
                    refined_lines.append(f"{indent}    {line.strip()}")
                    refined_lines.append(f"{indent}}}")
                else:
                    refined_lines.append(line)
            else:
                refined_lines.append(line)
        
        return '\n'.join(refined_lines)
    
    def _add_error_handling(self, code: str, comment: str) -> str:
        """Add try-catch error handling"""
        if 'async' in code or 'await' in code:
            # JavaScript async function
            lines = code.split('\n')
            refined_lines = [lines[0]]  # Keep function declaration
            refined_lines.append("    try {")
            for line in lines[1:-1]:
                refined_lines.append(f"    {line}")
            refined_lines.append("    } catch (error) {")
            refined_lines.append("        console.error('Error:', error);")
            refined_lines.append("        throw error;")
            refined_lines.append("    }")
            refined_lines.append(lines[-1])  # Keep closing brace
            return '\n'.join(refined_lines)
        
        return code  # Return unchanged if pattern doesn't match
    
    def _improve_naming(self, code: str, comment: str) -> str:
        """Improve variable naming"""
        # Simple example: replace single-letter variables
        replacements = {
            r'\bx\b': 'value',
            r'\by\b': 'offset',
            r'\bz\b': 'result',
            r'\ba\b': 'first',
            r'\bb\b': 'second'
        }
        
        refined_code = code
        for pattern, replacement in replacements.items():
            refined_code = re.sub(pattern, replacement, refined_code)
        
        return refined_code
    
    def _extract_constant(self, code: str, comment: str) -> str:
        """Extract magic numbers to constants"""
        # Find magic numbers
        matches = re.findall(r'[^\d]+(\d{2,})[^\d]+', code)
        
        if matches:
            # Add constants at the beginning
            constants = []
            refined_code = code
            
            for i, number in enumerate(set(matches)):
                const_name = f"CONSTANT_{number}"
                if 'function' in code:
                    constants.append(f"const {const_name} = {number};")
                else:
                    constants.append(f"{const_name} = {number}")
                
                # Replace in code
                refined_code = re.sub(f'\\b{number}\\b', const_name, refined_code)
            
            # Prepend constants
            return '\n'.join(constants) + '\n\n' + refined_code
        
        return code
    
    def _add_type_hints(self, code: str, comment: str) -> str:
        """Add type hints (Python example)"""
        if 'def' in code:
            # Simple type hint addition for Python
            refined_code = re.sub(
                r'def (\w+)\((\w+), (\w+)\):',
                r'def \1(\2: int, \3: int) -> int:',
                code
            )
            return refined_code
        
        return code
    
    def refine(self, source_code: str, comment: str, use_markers: bool = False) -> str:
        """Refine code based on comment"""
        refinement_type = self._identify_refinement_type(comment)
        
        if use_markers and '<START>' in source_code:
            # Extract marked section
            start_idx = source_code.index('<START>') + 7
            end_idx = source_code.index('<END>')
            marked_code = source_code[start_idx:end_idx]
            
            # Refine marked section
            if refinement_type in self.refinement_strategies:
                refined_marked = self.refinement_strategies[refinement_type](marked_code, comment)
                refined_code = source_code[:start_idx-7] + refined_marked + source_code[end_idx+5:]
            else:
                refined_code = source_code
        else:
            # Refine entire code
            if refinement_type in self.refinement_strategies:
                refined_code = self.refinement_strategies[refinement_type](source_code, comment)
            else:
                refined_code = source_code
        
        return refined_code

# Examples of code refinement
refinement_examples = [
    {
        'code': """function processUser(user) {
    user.save();
    return user.id;
}""",
        'comment': "Consider adding null check for 'user' before calling save()"
    },
    {
        'code': """def calculate(x, y):
    z = x * 1000 + y * 50
    return z""",
        'comment': "Magic numbers 1000 and 50 should be extracted to named constants"
    },
    {
        'code': """async function fetchData(url) {
    const response = await fetch(url);
    const data = await response.json();
    return data;
}""",
        'comment': "Missing error handling for fetch operation. Consider wrapping in try-catch"
    }
]

refiner = CodeRefiner()

print("Code Refinement Examples:\n")
for i, example in enumerate(refinement_examples):
    print(f"\n{'='*60}")
    print(f"Example {i+1}")
    print(f"{'='*60}")
    print("\nOriginal Code:")
    print(example['code'])
    print(f"\nReview Comment: {example['comment']}")
    print("\nRefined Code:")
    refined = refiner.refine(example['code'], example['comment'])
    print(refined)

## 5. Building an End-to-End Pipeline

Now let's combine all three tasks into a complete code review automation system.

In [None]:
class CodeReviewPipeline:
    """Complete code review automation pipeline"""
    
    def __init__(self):
        self.rnp = ReviewNecessityPredictor()
        self.rcg = ReviewCommentGenerator()
        self.cr = CodeRefiner()
        self.review_history = []
    
    def process_pull_request(self, pr_diffs: List[DiffHunk]) -> Dict:
        """Process a complete pull request"""
        results = {
            'total_diffs': len(pr_diffs),
            'needs_review': 0,
            'auto_approved': 0,
            'reviews': []
        }
        
        for diff in pr_diffs:
            # Step 1: Check if review is needed
            needs_review, confidence, reason = self.rnp.predict(diff)
            
            if needs_review:
                results['needs_review'] += 1
                
                # Step 2: Generate review comment
                code = '\n'.join(diff.new_lines)
                comments = self.rcg.generate_comment(code, perspective='method')
                
                # Step 3: Suggest refinement
                refined_code = None
                if comments and comments[0] != "Code looks good!":
                    refined_code = self.cr.refine(code, comments[0])
                
                review = {
                    'file': diff.file_path,
                    'lines': diff.line_numbers,
                    'needs_review': True,
                    'confidence': confidence,
                    'reason': reason,
                    'comments': comments,
                    'original_code': code,
                    'refined_code': refined_code
                }
            else:
                results['auto_approved'] += 1
                review = {
                    'file': diff.file_path,
                    'lines': diff.line_numbers,
                    'needs_review': False,
                    'confidence': confidence,
                    'reason': reason
                }
            
            results['reviews'].append(review)
            self.review_history.append(review)
        
        return results
    
    def generate_review_report(self, results: Dict) -> str:
        """Generate a human-readable review report"""
        report = []
        report.append("# Code Review Report")
        report.append(f"\nTotal files reviewed: {results['total_diffs']}")
        report.append(f"Files needing review: {results['needs_review']}")
        report.append(f"Files auto-approved: {results['auto_approved']}")
        report.append("\n## Detailed Reviews:\n")
        
        for review in results['reviews']:
            report.append(f"### {review['file']} (lines {review['lines'][0]}-{review['lines'][1]})")
            
            if review['needs_review']:
                report.append(f"**Status**: Needs Review (confidence: {review['confidence']:.2f})")
                report.append(f"**Reason**: {review['reason']}")
                
                if 'comments' in review and review['comments']:
                    report.append("\n**Review Comments:**")
                    for comment in review['comments']:
                        report.append(f"- {comment}")
                
                if review.get('refined_code'):
                    report.append("\n**Suggested Refinement:**")
                    report.append("```")
                    report.append(review['refined_code'])
                    report.append("```")
            else:
                report.append(f"**Status**: Auto-approved")
                report.append(f"**Reason**: {review['reason']}")
            
            report.append("\n---\n")
        
        return '\n'.join(report)

# Simulate a pull request with multiple diffs
pr_diffs = [
    DiffHunk(
        file_path="src/api/user.js",
        old_lines=[
            "async function updateUser(id, data) {",
            "    const user = await User.findById(id);",
            "    user.update(data);",
            "    return user;",
            "}"
        ],
        new_lines=[
            "async function updateUser(id, data) {",
            "    const user = await User.findById(id);",
            "    user.update(data);",
            "    user.save();",
            "    return user;",
            "}"
        ],
        line_numbers=(10, 15)
    ),
    DiffHunk(
        file_path="src/utils/calc.py",
        old_lines=[
            "def process(a, b):",
            "    return a + b"
        ],
        new_lines=[
            "def process(a, b):",
            "    c = a * 100 + b * 50",
            "    return c"
        ],
        line_numbers=(5, 7)
    ),
    DiffHunk(
        file_path="docs/README.md",
        old_lines=["# Project Title"],
        new_lines=["# Project Title", "", "## Description"],
        line_numbers=(1, 3)
    )
]

# Run the pipeline
pipeline = CodeReviewPipeline()
results = pipeline.process_pull_request(pr_diffs)

# Generate and display report
report = pipeline.generate_review_report(results)
print(report)

## 6. Performance Metrics and Evaluation

Let's visualize the performance metrics from the paper for each task.

In [None]:
# Performance data from the paper
performance_data = {
    'Review Necessity Prediction': {
        'models': ['Transformer-b', 'Tufano et al.', 'CodeT5', 'CodeReviewer', 'LLaMA-Reviewer'],
        'precision': [74.50, 70.82, 70.36, 78.60, 60.99],
        'recall': [46.07, 57.20, 58.96, 65.63, 83.50],
        'f1': [56.93, 63.29, 64.16, 71.53, 70.49]
    },
    'Review Comment Generation': {
        'models': ['Transformer-b', 'Tufano et al.', 'CodeT5', 'CodeReviewer', 'LLaMA-Reviewer (LoRA)'],
        'bleu4_crer': [4.76, 4.39, 4.83, 5.32, 5.70],
        'bleu4_tuf': [None, 7.39, None, None, 5.04]
    },
    'Code Refinement': {
        'models': ['Tufano et al.', 'CodeT5', 'CodeReviewer', 'LLaMA-Reviewer (LoRA)'],
        'bleu4_crer': [77.03, 80.82, 82.61, 82.27],
        'bleu4_tuf': [78.33, None, None, 78.23]
    }
}

# Create comprehensive visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Review Necessity Prediction - Precision/Recall/F1
ax1 = axes[0, 0]
rnp_data = performance_data['Review Necessity Prediction']
x = np.arange(len(rnp_data['models']))
width = 0.25

bars1 = ax1.bar(x - width, rnp_data['precision'], width, label='Precision', color='lightblue')
bars2 = ax1.bar(x, rnp_data['recall'], width, label='Recall', color='lightgreen')
bars3 = ax1.bar(x + width, rnp_data['f1'], width, label='F1', color='lightcoral')

ax1.set_xlabel('Models')
ax1.set_ylabel('Score (%)')
ax1.set_title('Review Necessity Prediction Performance')
ax1.set_xticks(x)
ax1.set_xticklabels(rnp_data['models'], rotation=45, ha='right')
ax1.legend()
ax1.grid(axis='y', alpha=0.3)

# Highlight LLaMA-Reviewer's high recall
ax1.annotate('High Recall!', xy=(4, 83.5), xytext=(3.5, 90),
            arrowprops=dict(arrowstyle='->', color='red', lw=2),
            fontsize=12, color='red', fontweight='bold')

# 2. Review Comment Generation - BLEU scores
ax2 = axes[0, 1]
rcg_data = performance_data['Review Comment Generation']
models = rcg_data['models']
crer_scores = rcg_data['bleu4_crer']

bars = ax2.bar(models, crer_scores, color='skyblue')
ax2.set_xlabel('Models')
ax2.set_ylabel('BLEU-4 Score')
ax2.set_title('Review Comment Generation (CRer Dataset)')
ax2.set_xticklabels(models, rotation=45, ha='right')
ax2.grid(axis='y', alpha=0.3)

# Add value labels
for bar, score in zip(bars, crer_scores):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height,
             f'{score:.2f}', ha='center', va='bottom')

# 3. Code Refinement - BLEU scores
ax3 = axes[1, 0]
cr_data = performance_data['Code Refinement']
models = cr_data['models']
crer_scores = cr_data['bleu4_crer']

bars = ax3.bar(models, crer_scores, color='lightgreen')
ax3.set_xlabel('Models')
ax3.set_ylabel('BLEU-4 Score')
ax3.set_title('Code Refinement (CRer Dataset)')
ax3.set_xticklabels(models, rotation=45, ha='right')
ax3.grid(axis='y', alpha=0.3)

# Add value labels
for bar, score in zip(bars, crer_scores):
    height = bar.get_height()
    ax3.text(bar.get_x() + bar.get_width()/2., height,
             f'{score:.2f}', ha='center', va='bottom')

# 4. Task complexity analysis
ax4 = axes[1, 1]
ax4.axis('off')

task_analysis = """Task Complexity Analysis (from paper):

1. Review Necessity Prediction:
   - Type: Binary Classification
   - Challenge: Imbalanced dataset
   - LLaMA-Reviewer: Optimized for recall
   
2. Review Comment Generation:
   - Type: Natural Language Generation
   - Challenge: Most complex task
   - LLaMA-Reviewer: Best performance (5.70)
   
3. Code Refinement:
   - Type: Code Generation/Translation
   - Challenge: Preserving semantics
   - LLaMA-Reviewer: Competitive (82.27)

Key Insight: LLaMA-Reviewer excels at
NL generation tasks due to its pre-training
on large text corpora."""

ax4.text(0.1, 0.9, task_analysis, transform=ax4.transAxes,
         fontsize=12, verticalalignment='top', fontfamily='monospace')

plt.tight_layout()
plt.show()

## 7. Integration with LangChain/LangGraph

Here's how to integrate this pipeline with LangChain for production use.

In [None]:
from typing import TypedDict, Annotated, Sequence
import operator

# Define state for LangGraph
class CodeReviewState(TypedDict):
    """State for code review workflow"""
    diff_hunks: List[DiffHunk]
    review_decisions: List[bool]
    comments: List[str]
    refined_codes: List[str]
    current_index: int
    iteration_count: int

# LangChain components
def create_langchain_pipeline():
    """Create LangChain components for code review"""
    
    # Prompt templates for each task
    rnp_prompt = PromptTemplate(
        input_variables=["diff_hunk"],
        template="""You are a code reviewer. Analyze the following diff and determine if it needs review.

Diff:
{diff_hunk}

Does this need review? (yes/no):"""
    )
    
    rcg_prompt = PromptTemplate(
        input_variables=["code"],
        template="""You are an experienced code reviewer. Provide constructive feedback for this code:

Code:
{code}

Review comment:"""
    )
    
    cr_prompt = PromptTemplate(
        input_variables=["code", "comment"],
        template="""Refine the following code based on the review comment:

Original code:
{code}

Review comment:
{comment}

Refined code:"""
    )
    
    return {
        'rnp_prompt': rnp_prompt,
        'rcg_prompt': rcg_prompt,
        'cr_prompt': cr_prompt
    }

# Example workflow implementation
class LangChainCodeReviewWorkflow:
    """Code review workflow using LangChain patterns"""
    
    def __init__(self):
        self.prompts = create_langchain_pipeline()
        self.pipeline = CodeReviewPipeline()
    
    def run_review_cycle(self, pr_diffs: List[DiffHunk], max_iterations: int = 3) -> Dict:
        """Run the review cycle with iteration limit"""
        
        state = {
            'diff_hunks': pr_diffs,
            'review_decisions': [],
            'comments': [],
            'refined_codes': [],
            'current_index': 0,
            'iteration_count': 0
        }
        
        results = []
        
        for diff in pr_diffs:
            iteration = 0
            current_code = '\n'.join(diff.new_lines)
            review_history = []
            
            while iteration < max_iterations:
                # Step 1: Check if review needed
                needs_review, confidence, reason = self.pipeline.rnp.predict(diff)
                
                if not needs_review:
                    break
                
                # Step 2: Generate comment
                comments = self.pipeline.rcg.generate_comment(current_code, 'method')
                
                if not comments or comments[0] == "Code looks good!":
                    break
                
                # Step 3: Refine code
                refined_code = self.pipeline.cr.refine(current_code, comments[0])
                
                review_history.append({
                    'iteration': iteration + 1,
                    'comment': comments[0],
                    'original': current_code,
                    'refined': refined_code
                })
                
                # Update for next iteration
                current_code = refined_code
                iteration += 1
            
            results.append({
                'file': diff.file_path,
                'iterations': len(review_history),
                'history': review_history,
                'final_code': current_code
            })
        
        return results

# Demonstrate the workflow
workflow = LangChainCodeReviewWorkflow()

# Create a problematic diff that needs multiple iterations
problematic_diff = DiffHunk(
    file_path="src/critical.js",
    old_lines=["function process(data) { return data; }"],
    new_lines=[
        "function process(x) {",
        "    y = x * 1000;",
        "    x.save();",
        "    return y;",
        "}"
    ],
    line_numbers=(1, 5)
)

# Run review cycle
print("Running iterative review cycle...\n")
cycle_results = workflow.run_review_cycle([problematic_diff], max_iterations=3)

# Display results
for result in cycle_results:
    print(f"File: {result['file']}")
    print(f"Total iterations: {result['iterations']}\n")
    
    for review in result['history']:
        print(f"--- Iteration {review['iteration']} ---")
        print(f"Comment: {review['comment']}")
        print(f"\nRefined code:")
        print(review['refined'])
        print()

## 8. Key Insights and Best Practices

### From the Paper's Implementation:

1. **Task Input/Output Formats** (Table I):
   - RNP: PL → L (Programming Language to Label)
   - RCG: PL → NL (Programming Language to Natural Language)  
   - CR: PL + NL → PL (Code + Comment to Refined Code)

2. **Dataset Characteristics**:
   - **CRer**: Multi-language, line-level, preserves formatting
   - **Tufano**: Java-only, method-level, cleaned formatting

3. **Performance Insights**:
   - LLaMA-Reviewer optimizes for **high recall** in RNP (83.5%)
   - Best at **comment generation** due to NL pre-training
   - Competitive in **code refinement** despite no code pre-training

### Implementation Best Practices:

1. **For Review Necessity Prediction**:
   - Balance precision vs recall based on use case
   - Consider cost of false positives vs false negatives
   - Use confidence scores for prioritization

2. **For Comment Generation**:
   - Choose perspective (line vs method) based on context
   - Provide specific, actionable feedback
   - Consider code context beyond the diff

3. **For Code Refinement**:
   - Preserve code semantics and style
   - Handle multiple refinement types
   - Support iterative improvement

### Future Enhancements:

1. **Multi-modal Understanding**: Combine code structure + semantics
2. **Context Awareness**: Consider entire file/project context
3. **Learning from Feedback**: Adapt based on developer responses
4. **IDE Integration**: Real-time review during development