# üìù Manuscript Evaluator: 8-Agent Paper Review System

**SNU AI Psychology Workshop - February 2026**

This notebook evaluates academic manuscripts using Nature-level criteria with an 8-agent pipeline.

## Workflow

```
Input Manuscript ‚Üí Hook Agent (30pts) ‚Üí Narrative Agent (30pts) ‚Üí Rigor Agent (20pts)
                ‚Üí Impact Agent (20pts) ‚Üí Fatal Flaw Detector ‚Üí Title Evaluator
                ‚Üí Reference Checker ‚Üí Improvement Planner ‚Üí Final Report (/100)
```

---

## 1. Setup & Installation

In [None]:
# Install required packages
!pip install -q google-generativeai

print("‚úì Installation complete")

In [None]:
# Import libraries
import os
import json
import time
import re
from dataclasses import dataclass, asdict
from typing import Optional, Dict, Any, List
from getpass import getpass

import google.generativeai as genai

print("‚úì Libraries imported")

## 2. API Configuration

Get your API key from [Google AI Studio](https://aistudio.google.com/)

In [None]:
# Enter your Gemini API Key
GEMINI_API_KEY = getpass("Enter your Gemini API Key: ")

genai.configure(api_key=GEMINI_API_KEY)

# Model settings
MODEL_NAME = 'gemini-2.5-flash'
DELAY_BETWEEN_CALLS = 3  # seconds (to avoid rate limits)

print(f"‚úì API configured")
print(f"‚úì Model: {MODEL_NAME}")
print(f"‚úì Delay between calls: {DELAY_BETWEEN_CALLS}s")

## 3. Load Your Manuscript

Choose one method:
- **Option A**: Upload a file (.md, .txt)
- **Option B**: Paste text directly

In [None]:
# Option A: Upload file
from google.colab import files

print("Upload your manuscript file (.md or .txt):")
uploaded = files.upload()

if uploaded:
    filename = list(uploaded.keys())[0]
    manuscript_text = uploaded[filename].decode('utf-8')
    print(f"\n‚úì Loaded: {filename}")
    print(f"‚úì Length: {len(manuscript_text):,} characters")
    print(f"‚úì Words: ~{len(manuscript_text.split()):,}")

In [None]:
# Option B: Paste text directly (run this cell instead of upload)
# Uncomment and paste your manuscript between the triple quotes

# manuscript_text = """
# # Your Paper Title
#
# ## Abstract
# Your abstract here...
#
# ## Introduction
# Your introduction here...
# """
#
# print(f"‚úì Manuscript loaded")
# print(f"‚úì Length: {len(manuscript_text):,} characters")

## 4. Data Structures & Helper Functions

In [None]:
@dataclass
class HookResult:
    one_sentence_summary: str
    opening_pattern: str
    gap_type: str
    opening_commentary: str
    gap_commentary: str
    score: int

@dataclass
class NarrativeResult:
    logical_flow: str
    narrative_control: str
    flow_commentary: str
    narrative_commentary: str
    score: int

@dataclass
class RigorResult:
    defensibility: str
    claim_discipline: str
    defensibility_commentary: str
    claims_commentary: str
    score: int

@dataclass
class ImpactResult:
    conclusion_type: str
    commentary: str
    score: int

@dataclass
class FatalFlaws:
    so_what: bool
    logic_leap: bool
    zombie_sloppy: bool
    details: str

@dataclass
class TitleResult:
    clarity: str
    keyword_presence: str
    length_assessment: str
    engagement: str
    commentary: str
    score: int

@dataclass
class ReferenceCheckResult:
    total_references: int
    cited_in_text: List[int]
    uncited_references: List[int]
    missing_references: List[int]
    hallucination_detected: bool
    issues: str

print("‚úì Data structures defined")

In [None]:
def safe_api_call(model, prompt, max_retries=3):
    """API call with retry logic"""
    for attempt in range(max_retries):
        try:
            response = model.generate_content(
                prompt,
                generation_config={'temperature': 0.2}
            )
            return response
        except Exception as e:
            if "ResourceExhausted" in str(e) or "429" in str(e):
                wait_time = (attempt + 1) * 10
                print(f"   ‚ö†Ô∏è Rate limit. Waiting {wait_time}s... ({attempt+1}/{max_retries})")
                time.sleep(wait_time)
            else:
                raise e
    raise Exception("Max retries exceeded")

def parse_json_response(response_text):
    """Extract JSON from response"""
    json_start = response_text.find('{')
    json_end = response_text.rfind('}') + 1
    if json_start == -1 or json_end == 0:
        raise ValueError("No JSON found in response")
    return json.loads(response_text[json_start:json_end])

def extract_title(manuscript_text: str) -> str:
    """Extract title from manuscript"""
    lines = manuscript_text.split('\n')
    for line in lines:
        line = line.strip()
        if line.startswith('# '):
            return line[2:].strip()
    return ""

def extract_references(manuscript_text: str) -> tuple:
    """Extract References section"""
    ref_match = re.search(r'##\s*References?\s*\n', manuscript_text, re.IGNORECASE)
    if not ref_match:
        return [], ""
    ref_text = manuscript_text[ref_match.end():]
    references = [line.strip() for line in ref_text.split('\n') if re.match(r'^\[\d+\]', line.strip())]
    return references, ref_text

def extract_citations(manuscript_text: str) -> List[int]:
    """Extract citation numbers from text"""
    ref_match = re.search(r'##\s*References?\s*\n', manuscript_text, re.IGNORECASE)
    text = manuscript_text[:ref_match.start()] if ref_match else manuscript_text
    
    citations = set()
    for match in re.finditer(r'\[(\d+(?:,\s*\d+|\s*-\s*\d+)*)\]', text):
        citation_str = match.group(1)
        if '-' in citation_str:
            parts = citation_str.split('-')
            citations.update(range(int(parts[0].strip()), int(parts[1].strip()) + 1))
        else:
            for num in citation_str.split(','):
                citations.add(int(num.strip()))
    return sorted(list(citations))

print("‚úì Helper functions defined")

## 5. Define All 8 Agents

In [None]:
def run_hook_agent(manuscript_text: str) -> HookResult:
    """Agent 1: Evaluate Hook (Abstract & Introduction) - 30 points"""
    model = genai.GenerativeModel(MODEL_NAME)
    
    prompt = f"""You are a Nature editor evaluating the HOOK of a manuscript.

MANUSCRIPT (First 5000 chars):
{manuscript_text[:5000]}

EVALUATE:
1. ONE-SENTENCE SUMMARY: Main take-home message
2. OPENING PATTERN: HIGH_IMPACT | BROAD_TO_NARROW | WEAK_NARROW
3. GAP DEFINITION: CONCEPTUAL | METHODOLOGICAL | INCREMENTAL | UNCLEAR
4. SCORE (0-30): 27-30 Exceptional, 24-26 Strong, 20-23 Adequate, <20 Weak

Return JSON:
{{
    "one_sentence_summary": "...",
    "opening_pattern": "...",
    "gap_type": "...",
    "opening_commentary": "2-3 sentences",
    "gap_commentary": "2-3 sentences",
    "score": 0-30
}}
"""
    response = safe_api_call(model, prompt)
    return HookResult(**parse_json_response(response.text))


def run_narrative_agent(manuscript_text: str) -> NarrativeResult:
    """Agent 2: Evaluate Narrative Quality - 30 points"""
    model = genai.GenerativeModel(MODEL_NAME)
    
    prompt = f"""You are a Nature editor evaluating NARRATIVE quality.

MANUSCRIPT (8000 chars):
{manuscript_text[5000:13000]}

EVALUATE:
1. LOGICAL FLOW: SEAMLESS | FUNCTIONAL | CHOPPY
2. NARRATIVE CONTROL: DIRECTOR | PASSIVE
3. SCORE (0-30)

Return JSON:
{{
    "logical_flow": "...",
    "narrative_control": "...",
    "flow_commentary": "2-3 sentences",
    "narrative_commentary": "2-3 sentences",
    "score": 0-30
}}
"""
    response = safe_api_call(model, prompt)
    return NarrativeResult(**parse_json_response(response.text))


def run_rigor_agent(manuscript_text: str) -> RigorResult:
    """Agent 3: Evaluate Rigor - 20 points"""
    model = genai.GenerativeModel(MODEL_NAME)
    
    prompt = f"""You are a Nature editor evaluating RIGOR.

MANUSCRIPT (10000 chars):
{manuscript_text[8000:18000]}

EVALUATE:
1. DEFENSIBILITY: BULLETPROOF | VULNERABLE
2. CLAIM DISCIPLINE: PRECISE | OVERCLAIMING | VAGUE
3. SCORE (0-20)

Return JSON:
{{
    "defensibility": "...",
    "claim_discipline": "...",
    "defensibility_commentary": "2-3 sentences",
    "claims_commentary": "2-3 sentences",
    "score": 0-20
}}
"""
    response = safe_api_call(model, prompt)
    return RigorResult(**parse_json_response(response.text))


def run_impact_agent(manuscript_text: str) -> ImpactResult:
    """Agent 4: Evaluate Impact - 20 points"""
    model = genai.GenerativeModel(MODEL_NAME)
    
    prompt = f"""You are a Nature editor evaluating IMPACT.

MANUSCRIPT (Last 10000 chars):
{manuscript_text[-10000:]}

EVALUATE:
1. CONCLUSION TYPE: FULL_CIRCLE | SUMMARY_ONLY
2. SCORE (0-20)

Return JSON:
{{
    "conclusion_type": "...",
    "commentary": "2-3 sentences",
    "score": 0-20
}}
"""
    response = safe_api_call(model, prompt)
    return ImpactResult(**parse_json_response(response.text))


def run_fatal_flaw_detector(manuscript_text: str, hook: HookResult, narrative: NarrativeResult) -> FatalFlaws:
    """Agent 5: Detect Fatal Flaws"""
    model = genai.GenerativeModel(MODEL_NAME)
    
    prompt = f"""You are a Nature editor checking for FATAL FLAWS.

MANUSCRIPT (15000 chars):
{manuscript_text[:15000]}

Previous: Gap={hook.gap_type}, Flow={narrative.logical_flow}

CHECK:
1. "SO WHAT?" - Irrelevant study?
2. "LOGIC LEAP" - Unexplained jumps?
3. "ZOMBIE/SLOPPY" - Typos, AI-like writing?

Return JSON:
{{
    "so_what": true|false,
    "logic_leap": true|false,
    "zombie_sloppy": true|false,
    "details": "Explain or 'None detected'"
}}
"""
    response = safe_api_call(model, prompt)
    return FatalFlaws(**parse_json_response(response.text))


def run_title_evaluator(manuscript_text: str) -> TitleResult:
    """Agent 6: Evaluate Title"""
    model = genai.GenerativeModel(MODEL_NAME)
    title = extract_title(manuscript_text)
    
    if not title:
        return TitleResult("MISSING", "N/A", "N/A", "N/A", "No title found", 0)
    
    prompt = f"""Evaluate this manuscript TITLE:

TITLE: {title}
CONTEXT: {manuscript_text[:3000]}

EVALUATE:
1. CLARITY: CRYSTAL_CLEAR | ADEQUATE | VAGUE
2. KEYWORDS: STRONG | PARTIAL | WEAK
3. LENGTH: OPTIMAL(10-15) | ACCEPTABLE(7-9,16-20) | TOO_SHORT | TOO_LONG
4. ENGAGEMENT: HIGH | MEDIUM | LOW
5. SCORE (0-10)

Return JSON:
{{
    "clarity": "...",
    "keyword_presence": "...",
    "length_assessment": "...",
    "engagement": "...",
    "commentary": "2-3 sentences",
    "score": 0-10
}}
"""
    response = safe_api_call(model, prompt)
    return TitleResult(**parse_json_response(response.text))


def run_reference_checker(manuscript_text: str) -> ReferenceCheckResult:
    """Agent 7: Check References"""
    cited = extract_citations(manuscript_text)
    refs, _ = extract_references(manuscript_text)
    ref_nums = [int(re.match(r'^\[(\d+)\]', r).group(1)) for r in refs if re.match(r'^\[(\d+)\]', r)]
    
    uncited = [n for n in ref_nums if n not in cited]
    missing = [n for n in cited if n not in ref_nums]
    
    issues = []
    if missing:
        issues.append(f"‚ö†Ô∏è HALLUCINATION: Missing refs: {missing}")
    if uncited:
        issues.append(f"üìù Uncited refs: {uncited}")
    if not issues:
        issues.append("‚úì All citations match")
    
    return ReferenceCheckResult(
        total_references=len(ref_nums),
        cited_in_text=cited,
        uncited_references=uncited,
        missing_references=missing,
        hallucination_detected=len(missing) > 0,
        issues="\n".join(issues)
    )


def run_improvement_planner(manuscript_text, hook, narrative, rigor, impact, flaws, title) -> str:
    """Agent 8: Generate Improvement Plan"""
    model = genai.GenerativeModel(MODEL_NAME)
    total = hook.score + narrative.score + rigor.score + impact.score
    
    prompt = f"""Create an IMPROVEMENT PLAN.

SCORES:
- Total: {total}/100
- Hook: {hook.score}/30 ({hook.gap_type})
- Narrative: {narrative.score}/30 ({narrative.logical_flow})
- Rigor: {rigor.score}/20 ({rigor.defensibility})
- Impact: {impact.score}/20 ({impact.conclusion_type})
- Flaws: {flaws.details}

MANUSCRIPT: {manuscript_text[:8000]}

Generate markdown with:
1. Current Score Summary
2. Priority fixes (if score < 90)
3. Specific improvements for weak areas
4. Expected outcome after fixes
"""
    response = safe_api_call(model, prompt)
    return response.text.strip()

print("‚úì All 8 agents defined")

## 6. Run All Agents

In [None]:
# Run all 8 agents
print("="*60)
print("MANUSCRIPT EVALUATION - 8 AGENT PIPELINE")
print("="*60)
print()

# Agent 1: Hook
print("[1/8] Hook Agent...")
hook_result = run_hook_agent(manuscript_text)
print(f"      ‚úì Score: {hook_result.score}/30 | {hook_result.gap_type}")
time.sleep(DELAY_BETWEEN_CALLS)

# Agent 2: Narrative
print("[2/8] Narrative Agent...")
narrative_result = run_narrative_agent(manuscript_text)
print(f"      ‚úì Score: {narrative_result.score}/30 | {narrative_result.logical_flow}")
time.sleep(DELAY_BETWEEN_CALLS)

# Agent 3: Rigor
print("[3/8] Rigor Agent...")
rigor_result = run_rigor_agent(manuscript_text)
print(f"      ‚úì Score: {rigor_result.score}/20 | {rigor_result.defensibility}")
time.sleep(DELAY_BETWEEN_CALLS)

# Agent 4: Impact
print("[4/8] Impact Agent...")
impact_result = run_impact_agent(manuscript_text)
print(f"      ‚úì Score: {impact_result.score}/20 | {impact_result.conclusion_type}")
time.sleep(DELAY_BETWEEN_CALLS)

# Agent 5: Fatal Flaws
print("[5/8] Fatal Flaw Detector...")
fatal_flaws = run_fatal_flaw_detector(manuscript_text, hook_result, narrative_result)
has_flaw = fatal_flaws.so_what or fatal_flaws.logic_leap or fatal_flaws.zombie_sloppy
print(f"      {'‚ö†Ô∏è FLAWS DETECTED' if has_flaw else '‚úì No fatal flaws'}")
time.sleep(DELAY_BETWEEN_CALLS)

# Agent 6: Title
print("[6/8] Title Evaluator...")
title_result = run_title_evaluator(manuscript_text)
print(f"      ‚úì Score: {title_result.score}/10 | {title_result.clarity}")
time.sleep(DELAY_BETWEEN_CALLS)

# Agent 7: References
print("[7/8] Reference Checker...")
ref_check = run_reference_checker(manuscript_text)
print(f"      {'‚ö†Ô∏è Issues found' if ref_check.hallucination_detected else '‚úì References OK'}")
time.sleep(DELAY_BETWEEN_CALLS)

# Agent 8: Improvement Plan
print("[8/8] Improvement Planner...")
improvement_plan = run_improvement_planner(
    manuscript_text, hook_result, narrative_result,
    rigor_result, impact_result, fatal_flaws, title_result
)
print("      ‚úì Plan generated")

print()
print("="*60)
print("‚úÖ ALL AGENTS COMPLETE")
print("="*60)

## 7. Final Report

In [None]:
# Calculate total score
total_score = hook_result.score + narrative_result.score + rigor_result.score + impact_result.score

# Tier classification
if total_score >= 90:
    tier = "üèÜ TOP 5% - Nature/Science Level"
elif total_score >= 80:
    tier = "ü•á TOP 10% - Cell/PNAS Level"
elif total_score >= 70:
    tier = "ü•à SOLID - Top Specialty Journal"
elif total_score >= 60:
    tier = "ü•â ACCEPTABLE - Revision Needed"
else:
    tier = "‚≠ï WEAK - Major Revision Required"

# Display final report
print("\n" + "="*70)
print("                    FINAL EVALUATION REPORT")
print("="*70)

print(f"\n{'Dimension':<35} {'Score':>15}")
print("-"*52)
print(f"{'Hook (Opening & Gap)':<35} {hook_result.score:>3} / 30")
print(f"{'Narrative (Flow & Voice)':<35} {narrative_result.score:>3} / 30")
print(f"{'Rigor (Methods & Claims)':<35} {rigor_result.score:>3} / 20")
print(f"{'Impact (Discussion)':<35} {impact_result.score:>3} / 20")
print("-"*52)
print(f"{'TOTAL SCORE':<35} {total_score:>3} / 100")
print()
print(f"TIER: {tier}")
print()

# Quality checks
print("QUALITY CHECKS:")
print("-"*52)
title = extract_title(manuscript_text)
if title:
    print(f"Title: \"{title[:50]}...\"")
print(f"Title Score: {title_result.score}/10 ({title_result.clarity})")
print(f"References: {'‚ö†Ô∏è ISSUES' if ref_check.hallucination_detected else '‚úì OK'}")
print()

# Fatal flaws warning
if fatal_flaws.so_what or fatal_flaws.logic_leap or fatal_flaws.zombie_sloppy:
    print("üö® FATAL FLAW DETECTED - AUTO REJECT")
    print(f"   {fatal_flaws.details[:200]}...")
    print()

print("="*70)

## 8. Detailed Feedback

In [None]:
print("\nüìù ONE-SENTENCE SUMMARY:")
print(f"   {hook_result.one_sentence_summary}")

print("\nüéØ HOOK ANALYSIS:")
print(f"   Opening: {hook_result.opening_pattern}")
print(f"   {hook_result.opening_commentary}")
print(f"   Gap: {hook_result.gap_type}")
print(f"   {hook_result.gap_commentary}")

print("\nüìñ NARRATIVE ANALYSIS:")
print(f"   Flow: {narrative_result.logical_flow}")
print(f"   {narrative_result.flow_commentary}")
print(f"   Control: {narrative_result.narrative_control}")
print(f"   {narrative_result.narrative_commentary}")

print("\nüî¨ RIGOR ANALYSIS:")
print(f"   Defensibility: {rigor_result.defensibility}")
print(f"   {rigor_result.defensibility_commentary}")
print(f"   Claims: {rigor_result.claim_discipline}")
print(f"   {rigor_result.claims_commentary}")

print("\nüéØ IMPACT ANALYSIS:")
print(f"   Conclusion: {impact_result.conclusion_type}")
print(f"   {impact_result.commentary}")

## 9. Improvement Plan

In [None]:
from IPython.display import Markdown, display

print("="*70)
print("                    IMPROVEMENT PLAN")
print("="*70)
print()

display(Markdown(improvement_plan))

## 10. Download Results

In [None]:
# Save results to JSON
results = {
    "total_score": total_score,
    "tier": tier,
    "hook": asdict(hook_result),
    "narrative": asdict(narrative_result),
    "rigor": asdict(rigor_result),
    "impact": asdict(impact_result),
    "fatal_flaws": asdict(fatal_flaws),
    "title": asdict(title_result),
    "reference_check": asdict(ref_check),
    "improvement_plan": improvement_plan
}

# Save JSON
with open('evaluation_results.json', 'w') as f:
    json.dump(results, f, indent=2, ensure_ascii=False)

# Save markdown report
report_md = f"""# Manuscript Evaluation Report

## Score: {total_score}/100
**Tier**: {tier}

| Dimension | Score |
|-----------|-------|
| Hook | {hook_result.score}/30 |
| Narrative | {narrative_result.score}/30 |
| Rigor | {rigor_result.score}/20 |
| Impact | {impact_result.score}/20 |

## One-Sentence Summary
{hook_result.one_sentence_summary}

## Improvement Plan
{improvement_plan}

---
*Generated by 8-Agent Manuscript Evaluator*
"""

with open('evaluation_report.md', 'w') as f:
    f.write(report_md)

# Download files
from google.colab import files
files.download('evaluation_results.json')
files.download('evaluation_report.md')

print("\n‚úÖ Files ready for download!")

---

## About This Tool

This 8-agent pipeline evaluates manuscripts based on Nature editor criteria:

| Agent | Focus | Points |
|-------|-------|--------|
| Hook | Opening & Gap | 30 |
| Narrative | Flow & Voice | 30 |
| Rigor | Methods & Claims | 20 |
| Impact | Discussion | 20 |
| Fatal Flaw | Auto-reject check | - |
| Title | Title quality | 10 |
| Reference | Citation check | - |
| Improvement | Action plan | - |

**Model**: Gemini 2.5 Flash

---
*SNU AI Psychology Workshop - February 2026*