# PRPipeline Tutorial

The **PRPipeline** performs comprehensive pull request reviews through a three-stage workflow.

## Overview

- **Type**: LangGraph Workflow (multi-step)
- **Use Case**: Automated PR review, CI/CD integration
- **Features**: Security scan, quality review, summary generation

## Workflow Steps

```
+----------------+     +-----------------+     +-----------------+     +-----+
| 1. Security    | --> | 2. Quality      | --> | 3. Generate     | --> | END |
|    Scan        |     |    Review       |     |    Summary      |     |     |
+----------------+     +-----------------+     +-----------------+     +-----+
```

## Setup

In [None]:
from agent_workshop.agents.software_dev import PRPipeline
from agent_workshop import Config

config = Config()
pipeline = PRPipeline(config)

print(f"Provider: {pipeline.provider_name}")
print(f"Model: {pipeline.model_name}")

## Input/Output Format

**Input**:
```python
{
    "content": str,         # Required: Code diff or snippet to review
    "title": str,           # Optional: PR title
    "description": str,     # Optional: PR description
    "files_changed": list   # Optional: List of changed file paths
}
```

**Output**:
```python
{
    "approved": bool,           # false if critical/high severity issues
    "recommendation": str,      # "approve", "request_changes", or "comment"
    "blocking_issues": int,     # Count of critical + high severity issues
    "summary": str,             # PR-ready review comment for GitHub
    "security_issues": list,    # Security findings
    "quality_issues": list,     # Quality findings
    "timestamp": str            # ISO timestamp
}
```

## Understanding the Steps

### Step 1: Security Scan
- Hardcoded credentials (API keys, passwords, tokens)
- Injection vulnerabilities (SQL, command, XSS)
- Authentication/authorization issues
- Sensitive data exposure

### Step 2: Quality Review
- Error handling and edge cases
- Code clarity and readability
- Resource management
- Performance concerns

### Step 3: Summary Generation
- Consolidates all findings
- Generates actionable GitHub PR comment
- Determines approval status

## Example: Review Code with Issues

In [None]:
# Sample code with security and quality issues
problematic_code = '''
import os

class UserService:
    API_KEY = "sk-prod-abc123xyz789"  # Security: hardcoded credential
    
    def get_user(self, user_id):
        # Security: SQL injection vulnerability
        query = f"SELECT * FROM users WHERE id = {user_id}"
        result = self.db.execute(query)
        return result
    
    def delete_user(self, user_id):
        # Quality: No error handling
        # Quality: No authorization check
        self.db.execute(f"DELETE FROM users WHERE id = {user_id}")
        return True
'''

print("Code to review:")
print(problematic_code)

In [None]:
# Prepare PR review input
pr_input = {
    "content": problematic_code,
    "title": "Add user service functionality",
    "description": "Implements basic CRUD operations for users"
}

print("PR Input:")
print(f"  Title: {pr_input['title']}")
print(f"  Description: {pr_input['description']}")
print(f"  Code length: {len(pr_input['content'])} chars")

In [None]:
# Run the PR review (uncomment to execute)
# result = await pipeline.run(pr_input)
#
# print("PR Review Result:")
# print("=" * 50)
# print(f"Approved: {result['approved']}")
# print(f"Recommendation: {result['recommendation']}")
# print(f"Blocking Issues: {result['blocking_issues']}")
#
# print("\nSecurity Issues:")
# for issue in result.get('security_issues', []):
#     print(f"  [{issue['severity'].upper()}] {issue['message']}")
#
# print("\nQuality Issues:")
# for issue in result.get('quality_issues', []):
#     print(f"  [{issue['severity'].upper()}] {issue['message']}")
#
# print(f"\nSummary:\n{result['summary']}")

## Example: Review Clean Code

In [None]:
# Sample clean code
clean_code = '''
import os
import logging
from typing import Optional

logger = logging.getLogger(__name__)

class UserService:
    """Service for managing user operations."""
    
    def __init__(self, db_connection, auth_service):
        self.db = db_connection
        self.auth = auth_service
        self.api_key = os.environ.get("API_KEY")
    
    def get_user(self, user_id: int, requester_id: int) -> Optional[dict]:
        """Fetch user by ID with authorization check."""
        if not self.auth.can_view_user(requester_id, user_id):
            logger.warning(f"Unauthorized access attempt: {requester_id} -> {user_id}")
            raise PermissionError("Not authorized")
        
        try:
            # Parameterized query prevents SQL injection
            result = self.db.execute(
                "SELECT * FROM users WHERE id = ?",
                (user_id,)
            )
            return result.fetchone()
        except Exception as e:
            logger.exception("Database query failed")
            raise
'''

print("Clean code example:")
print(clean_code)

## Custom Prompts

Customize each step's behavior:

In [None]:
# Custom pipeline focused on Python best practices
python_pipeline = PRPipeline(
    config=config,
    security_prompt="""Analyze this Python code for security issues.

PR Title: {title}
PR Description: {description}

Code:
```python
{content}
```

Focus on:
1. Hardcoded secrets (os.environ is good)
2. SQL injection (parameterized queries are good)
3. Pickle/eval usage (dangerous)
4. Path traversal vulnerabilities

Return JSON with issues array.""",
    quality_prompt="""Review Python code quality.

Security findings: {security_result}

Code:
```python
{content}
```

Focus on:
1. Type hints usage
2. Docstring presence
3. PEP 8 compliance
4. Error handling patterns

Return JSON with issues array."""
)

print("Custom Python-focused PR Pipeline created")

## Issue Severity Levels

Issues are categorized by severity:

| Severity | Description | Blocks Approval | Examples |
|----------|-------------|-----------------|----------|
| **critical** | Security vulnerabilities | Yes | Hardcoded secrets, SQL injection |
| **high** | Major bugs, logic errors | Yes | Missing auth checks, data exposure |
| **medium** | Code quality concerns | No | Poor error handling, complexity |
| **low** | Style, minor improvements | No | Naming, documentation gaps |

## Issue Categories

### Security Categories
- `credentials` - Hardcoded secrets, API keys
- `injection` - SQL, command, XSS injection
- `auth` - Authentication/authorization issues
- `exposure` - Sensitive data exposure
- `config` - Insecure configuration

### Quality Categories
- `error_handling` - Missing or poor error handling
- `clarity` - Code readability issues
- `resources` - Resource management problems
- `performance` - Performance concerns
- `organization` - Code structure issues

## CI/CD Integration

Use PRPipeline in your CI/CD workflow:

```python
async def review_pr(diff_content: str, pr_title: str) -> bool:
    """Review a PR and return whether it should be approved."""
    pipeline = PRPipeline(Config())
    
    result = await pipeline.run({
        "content": diff_content,
        "title": pr_title
    })
    
    # Post the summary as a PR comment
    post_pr_comment(result['summary'])
    
    # Return approval status
    return result['approved']
```

In [None]:
# View the workflow graph
graph = pipeline.build_graph()

print("PR Pipeline Steps:")
for node in graph.nodes:
    if not node.startswith("__"):
        print(f"  - {node}")

## Next Steps

- **[05_release_pipeline.ipynb](./05_release_pipeline.ipynb)** - Hybrid workflow with git operations
- **[03_code_reviewer.ipynb](./03_code_reviewer.ipynb)** - Simple code review agent
- **[02_validation_pipeline.ipynb](./02_validation_pipeline.ipynb)** - Basic multi-step workflow