# CrossRef Local - Demo for the LLM Era

[![nbviewer](https://img.shields.io/badge/view-nbviewer-orange)](https://nbviewer.jupyter.org/github/ywatanabe1989/crossref-local/blob/develop/examples/quickstart.ipynb)
[![GitHub](https://img.shields.io/badge/source-GitHub-blue)](https://github.com/ywatanabe1989/crossref-local)

> **Note**: This notebook requires a local CrossRef database (1.5TB). 
> For installation, see [README](../README.md).

Key features that matter for AI research assistants:

1. **ABSTRACTS** - Full text for LLM context (not available in many APIs)
2. **IMPACT FACTOR** - Journal quality assessment  
3. **CITATIONS** - Paper importance metrics
4. **SPEED** - 167M records in milliseconds, no rate limits

In [None]:
import sys
sys.path.insert(0, '../src')

from crossref_local import search, get, count, info

## Database Overview

In [1]:
db = info()
print(f"Works:     {db['works']:,}")
print(f"FTS:       {db['fts_indexed']:,}")
print(f"Citations: {db['citations']:,}")

Works:     167,008,748
FTS:       167,008,748
Citations: 1,788,599,072


## 1. Abstracts - Full Text for LLM Context

Unlike many APIs, CrossRef includes abstracts - essential for LLMs to understand paper content.

In [2]:
results = search("hippocampal memory consolidation", limit=3)

for work in results.works:
    print(f"📄 {work.title}")
    print(f"   {work.journal} ({work.year})")
    if work.abstract:
        print(f"   📝 {work.abstract[:300]}...")
    print()

📄 DREADD-inactivation of dorsal CA1 pyramidal neurons...
   Hippocampus (2022)
   📝 The hippocampus is critical for the consolidation of information from short-term memory into long-term episodic memory and spatial navigation...

📄 Memory consolidation during sleep: from data to theory
   Current Opinion in Neurobiology (2020)
   📝 Sleep is essential for memory consolidation. Recent studies have revealed how neural activity patterns...



## 2. Impact Factor - Assess Journal Quality

In [3]:
from crossref_local.impact_factor import ImpactFactorCalculator

with ImpactFactorCalculator() as calc:
    for journal in ["Nature", "Science", "Cell", "PLOS ONE"]:
        result = calc.calculate_impact_factor(journal, target_year=2023)
        if result:
            print(f"{journal}: IF = {result['impact_factor']:.2f}")

Nature: IF = 54.07
Science: IF = 46.17
Cell: IF = 54.01
PLOS ONE: IF = 3.37


## 3. Speed - 167M Records, No Rate Limits

In [4]:
import time

queries = ["machine learning", "CRISPR", "climate change", "neural network"]

for q in queries:
    start = time.perf_counter()
    n = count(q)
    elapsed = (time.perf_counter() - start) * 1000
    print(f"{q:<20} → {n:>10,} matches in {elapsed:.0f}ms")

machine learning     →    477,922 matches in 104ms
CRISPR               →     63,989 matches in 35ms
climate change       →    843,759 matches in 192ms
neural network       →    579,367 matches in 138ms


## 4. Get Work by DOI

In [None]:
work = get("10.1038/nature12373")

if work:
    print(f"Title:   {work.title}")
    print(f"Authors: {', '.join(work.authors)}")
    print(f"Year:    {work.year}")
    print(f"Journal: {work.journal}")
    print(f"\nCitation: {work.citation()}")

## Use Case: LLM Research Assistant

Build context for an LLM by retrieving relevant papers with abstracts.

In [None]:
def build_research_context(topic: str, n_papers: int = 5) -> str:
    """Build LLM context from relevant papers."""
    results = search(topic, limit=n_papers)
    
    context = f"## Research Context: {topic}\n\n"
    
    for i, work in enumerate(results.works, 1):
        context += f"### Paper {i}: {work.title}\n"
        context += f"- Authors: {', '.join(work.authors[:3])}"
        if len(work.authors) > 3:
            context += " et al."
        context += f"\n- Journal: {work.journal} ({work.year})\n"
        if work.abstract:
            context += f"- Abstract: {work.abstract}\n"
        context += f"- DOI: {work.doi}\n\n"
    
    return context

# Example
context = build_research_context("transformer attention mechanism", n_papers=3)
print(context[:2000] + "...")

---

## TODO: Graphing Support

Future features:
- Citation network visualization
- Impact factor trends over time
- Author collaboration networks
- Topic clustering