# Algorithm Variants in rapid_textrank

This notebook explores the TextRank algorithm variants available in `rapid_textrank`:

| Variant | Best For | Key Feature |
|---------|----------|-------------|
| **BaseTextRank** | General text | Standard TextRank implementation |
| **PositionRank** | Academic papers, news | Favors words appearing early |
| **BiasedTextRank** | Topic-focused extraction | Biases toward specified focus terms |
| **TopicalPageRank** | Domain-specific extraction | Biases toward topic-weighted terms via personalized PageRank |
| **MultipartiteRank** | Multi-topic documents | K-partite graph with intra-topic edges removed; boosts first-occurring variants |

In [1]:
# Install if needed
%pip install -q rapid_textrank

Note: you may need to restart the kernel to use updated packages.


In [2]:
from rapid_textrank import BaseTextRank, PositionRank, BiasedTextRank, TopicalPageRank, MultipartiteRank

## 1. BaseTextRank

The standard TextRank algorithm, based on [Mihalcea & Tarau (2004)](https://aclanthology.org/W04-3252/).

**How it works:**
1. Builds a co-occurrence graph from content words
2. Runs PageRank to score word importance
3. Extracts phrases by grouping high-scoring words

**Best for:** General-purpose keyword extraction where word position doesn't matter.

In [3]:
text = """
Natural language processing (NLP) is a field of artificial intelligence
that focuses on the interaction between computers and humans through
natural language. The ultimate goal of NLP is to enable computers to
understand, interpret, and generate human language in a valuable way.

Machine learning approaches have transformed NLP in recent years.
Deep learning models, particularly transformers, have achieved
state-of-the-art results on many NLP tasks including translation,
summarization, and question answering.
"""

base = BaseTextRank(top_n=10, language="en")
result = base.extract_keywords(text)

print("BaseTextRank Results:")
print("=" * 50)
for p in result.phrases:
    print(f"{p.rank:>2}. {p.text:<35} {p.score:.4f}")

BaseTextRank Results:
 1. generate human language             0.1374
 2. NLP tasks                           0.1308
 3. NLP                                 0.0970
 4. Natural language                    0.0745
 5. enable computers                    0.0642
 6. humans                              0.0475
 7. understand interpret                0.0453
 8. artificial intelligence             0.0443
 9. computers                           0.0396
10. models                              0.0357


## 2. PositionRank

Based on [Florescu & Caragea (2017)](https://aclanthology.org/P17-1102/), PositionRank weights words by their position in the document.

**Key insight:** In many documents (papers, news articles, reports), important terms appear early—in titles, abstracts, or introductory paragraphs.

**How it differs from BaseTextRank:**
- Words appearing early get higher initial importance
- Position weight decays as you move through the document
- The PageRank algorithm then refines these position-biased scores

**Best for:** Academic papers, news articles, structured documents with front-loaded information.

In [4]:
# Academic abstract where key terms appear in the title/first sentence
abstract = """
Quantum Error Correction in Near-Term Quantum Computers

We present a novel approach to quantum error correction that significantly
reduces the overhead required for fault-tolerant quantum computation.
Our method leverages machine learning to predict and correct errors
in real-time. Experimental results on superconducting qubits demonstrate
a 50% reduction in logical error rates. These advances bring us closer
to practical quantum computing applications.
"""

pos = PositionRank(top_n=10, language="en")
result = pos.extract_keywords(abstract)

print("PositionRank Results:")
print("=" * 50)
for p in result.phrases:
    print(f"{p.rank:>2}. {p.text:<35} {p.score:.4f}")

PositionRank Results:
 1. Quantum Error Correction            0.4726
 2. Term Quantum Computers              0.4123
 3. fault tolerant quantum computation  0.0690
 4. logical error rates                 0.0425
 5. practical quantum                   0.0377
 6. correct errors                      0.0362
 7. method leverages machine            0.0289
 8. real time                           0.0158
 9. qubits demonstrate                  0.0142
10. overhead                            0.0128


### BaseTextRank vs PositionRank: Side-by-Side

Let's compare both algorithms on the same text to see how position weighting affects results:

In [5]:
# Same abstract, both algorithms
base = BaseTextRank(top_n=5, language="en")
pos = PositionRank(top_n=5, language="en")

base_result = base.extract_keywords(abstract)
pos_result = pos.extract_keywords(abstract)

print(f"{'BaseTextRank':<35} {'PositionRank':<35}")
print("=" * 70)

for i in range(5):
    base_phrase = base_result.phrases[i]
    pos_phrase = pos_result.phrases[i]
    print(f"{base_phrase.text:<35} {pos_phrase.text:<35}")

BaseTextRank                        PositionRank                       
fault tolerant quantum computation  Quantum Error Correction           
quantum error correction            Term Quantum Computers             
logical error rates                 fault tolerant quantum computation 
practical quantum                   logical error rates                
correct errors                      practical quantum                  


In [6]:
# Let's see which phrases are unique to each algorithm
base_texts = {p.text for p in base_result.phrases}
pos_texts = {p.text for p in pos_result.phrases}

only_base = base_texts - pos_texts
only_pos = pos_texts - base_texts
both = base_texts & pos_texts

print(f"In both: {both}")
print(f"Only in BaseTextRank: {only_base}")
print(f"Only in PositionRank: {only_pos}")

In both: {'fault tolerant quantum computation', 'practical quantum', 'logical error rates'}
Only in BaseTextRank: {'quantum error correction', 'correct errors'}
Only in PositionRank: {'Quantum Error Correction', 'Term Quantum Computers'}


## 3. BiasedTextRank

Based on [Kazemi et al. (2020)](https://aclanthology.org/2020.coling-main.144/), BiasedTextRank steers extraction toward specified focus terms.

**Key parameters:**
- `focus_terms`: List of terms to bias toward
- `bias_weight`: How strongly to favor focus terms (higher = stronger bias)

**How it works:**
- Focus terms get an initial boost in the PageRank algorithm
- Words connected to focus terms inherit some of this bias
- The result emphasizes the topic you care about

**Best for:** Topic-specific extraction, document filtering, aspect-based analysis.

In [7]:
# Text covering multiple topics
tech_article = """
Modern web applications must balance user experience with security.
Performance optimizations are crucial for mobile users on slow networks.
Privacy regulations like GDPR require careful data handling and consent.
Security vulnerabilities can expose sensitive user information.
Caching strategies improve response times but complicate data freshness.
Authentication systems must prevent unauthorized access while remaining
user-friendly. Encryption protects data both in transit and at rest.
"""

# Focus on security/privacy topics
biased = BiasedTextRank(
    focus_terms=["security", "privacy"],
    bias_weight=5.0,
    top_n=10,
    language="en"
)

result = biased.extract_keywords(tech_article)

print("BiasedTextRank (focus: security, privacy):")
print("=" * 50)
for p in result.phrases:
    print(f"{p.rank:>2}. {p.text:<35} {p.score:.4f}")

BiasedTextRank (focus: security, privacy):
 1. balance user experience             0.1326
 2. sensitive user                      0.1018
 3. mobile users                        0.1018
 4. complicate data freshness           0.0947
 5. Encryption protects data            0.0912
 6. careful data                        0.0872
 7. strategies improve response times   0.0832
 8. user                                0.0751
 9. GDPR require                        0.0649
10. Security vulnerabilities            0.0638


### Effect of `bias_weight`

The `bias_weight` parameter controls how strongly results favor the focus terms:

In [8]:
# Compare different bias weights
weights = [1.0, 2.0, 5.0, 10.0]

print("Bias Weight Comparison (focus: security, privacy)")
print("=" * 80)

for weight in weights:
    biased = BiasedTextRank(
        focus_terms=["security", "privacy"],
        bias_weight=weight,
        top_n=3,
        language="en"
    )
    result = biased.extract_keywords(tech_article)
    
    print(f"\nbias_weight={weight}:")
    for p in result.phrases:
        print(f"  {p.text}: {p.score:.4f}")

Bias Weight Comparison (focus: security, privacy)

bias_weight=1.0:
  balance user experience: 0.1158
  complicate data freshness: 0.1123
  Encryption protects data: 0.1077

bias_weight=2.0:
  balance user experience: 0.1209
  complicate data freshness: 0.1070
  Encryption protects data: 0.1027

bias_weight=5.0:
  balance user experience: 0.1326
  sensitive user: 0.1018
  mobile users: 0.1018

bias_weight=10.0:
  balance user experience: 0.1453
  sensitive user: 0.1127
  mobile users: 0.1082


In [9]:
# Compare biased vs unbiased on the same text
base = BaseTextRank(top_n=5, language="en")
biased_security = BiasedTextRank(
    focus_terms=["security", "privacy"],
    bias_weight=5.0,
    top_n=5,
    language="en"
)
biased_perf = BiasedTextRank(
    focus_terms=["performance", "speed", "cache"],
    bias_weight=5.0,
    top_n=5,
    language="en"
)

base_result = base.extract_keywords(tech_article)
security_result = biased_security.extract_keywords(tech_article)
perf_result = biased_perf.extract_keywords(tech_article)

print(f"{'Unbiased':<25} {'Security Focus':<25} {'Performance Focus':<25}")
print("=" * 75)
for i in range(5):
    print(f"{base_result.phrases[i].text:<25} "
          f"{security_result.phrases[i].text:<25} "
          f"{perf_result.phrases[i].text:<25}")

Unbiased                  Security Focus            Performance Focus        
balance user experience   balance user experience   balance user experience  
complicate data freshness sensitive user            mobile users             
Encryption protects data  mobile users              complicate data freshness
strategies improve response times complicate data freshness Encryption protects data 
careful data              Encryption protects data  strategies improve response times


### Dynamic Focus Terms

You can also pass `focus_terms` per-call, which is useful when processing multiple documents with different topics:

In [10]:
# Create extractor with default focus
extractor = BiasedTextRank(
    focus_terms=["default"],  # Placeholder
    bias_weight=5.0,
    top_n=5,
    language="en"
)

# Override focus_terms per call
topics = [
    ["security", "encryption"],
    ["performance", "caching"],
    ["user", "experience"]
]

for focus in topics:
    result = extractor.extract_keywords(tech_article, focus_terms=focus)
    print(f"\nFocus: {focus}")
    for p in result.phrases[:3]:
        print(f"  - {p.text}")


Focus: ['security', 'encryption']
  - balance user experience
  - Encryption protects data
  - complicate data freshness

Focus: ['performance', 'caching']
  - balance user experience
  - mobile users
  - complicate data freshness

Focus: ['user', 'experience']
  - balance user experience
  - mobile users
  - sensitive user


## 4. TopicalPageRank

Based on [Sterckx et al. (2015)](https://aclanthology.org/), TopicalPageRank uses **personalized PageRank** to steer extraction toward topic-relevant terms.

**Key parameters:**
- `topic_weights`: Dict mapping lemmas to importance weights (e.g. from LDA topics)
- `min_weight`: Baseline weight for words not in the topic vocabulary (default `0.0`)

**How it differs from BiasedTextRank:**
- BiasedTextRank boosts *specific focus terms* you provide (e.g. "security", "privacy")
- TopicalPageRank uses a *distribution of weights* over many terms, typically derived from topic models like LDA
- The weights act as the PageRank teleport distribution — when the random surfer restarts, it jumps to nodes proportionally to their topic weight
- Only relative proportions matter; weights are normalized internally

**Best for:** Domain-specific extraction where you have topic model output or domain vocabularies with graded importance.

In [11]:
# Text covering multiple topics — same one used for BiasedTextRank above
# TopicalPageRank lets us weight many terms at once with graded importance

topic_weights = {
    "security": 0.9,
    "privacy": 0.8,
    "encryption": 0.7,
    "authentication": 0.6,
    "data": 0.4,
    "access": 0.3,
}

tpr = TopicalPageRank(
    topic_weights=topic_weights,
    min_weight=0.0,   # OOV words get zero teleport probability
    top_n=10,
    language="en"
)

result = tpr.extract_keywords(tech_article)

print("TopicalPageRank (security/privacy topic weights):")
print("=" * 50)
for p in result.phrases:
    print(f"{p.rank:>2}. {p.text:<35} {p.score:.4f}")

TopicalPageRank (security/privacy topic weights):
 1. Encryption protects data            0.1347
 2. balance user experience             0.1105
 3. complicate data freshness           0.1069
 4. mobile users                        0.0922
 5. sensitive user                      0.0918
 6. careful data                        0.0911
 7. Security vulnerabilities            0.0759
 8. user                                0.0750
 9. Privacy regulations                 0.0694
10. Authentication systems              0.0606


### Effect of `min_weight`

The `min_weight` parameter controls how much "attention" out-of-vocabulary words receive during the random walk's teleport step:

- `min_weight=0.0` — Only topic-relevant words get teleport probability (strong focus)
- `min_weight > 0` — All words get at least this baseline, softening the topic bias

In [12]:
# Compare different min_weight values
min_weights = [0.0, 0.01, 0.1, 0.5]

print("min_weight Comparison (same topic_weights)")
print("=" * 80)

for mw in min_weights:
    tpr = TopicalPageRank(
        topic_weights=topic_weights,
        min_weight=mw,
        top_n=3,
        language="en"
    )
    result = tpr.extract_keywords(tech_article)

    print(f"\nmin_weight={mw}:")
    for p in result.phrases:
        print(f"  {p.text}: {p.score:.4f}")

min_weight Comparison (same topic_weights)

min_weight=0.0:
  Encryption protects data: 0.1347
  balance user experience: 0.1105
  complicate data freshness: 0.1069

min_weight=0.01:
  Encryption protects data: 0.1321
  balance user experience: 0.1120
  complicate data freshness: 0.1066

min_weight=0.1:
  balance user experience: 0.1198
  Encryption protects data: 0.1189
  complicate data freshness: 0.1055

min_weight=0.5:
  balance user experience: 0.1278
  Encryption protects data: 0.1054
  complicate data freshness: 0.1044


### TopicalPageRank vs BiasedTextRank: Side-by-Side

Both variants let you steer extraction toward a topic, but they work differently:
- **BiasedTextRank** takes a flat list of focus terms with a single `bias_weight` multiplier
- **TopicalPageRank** takes a *weighted vocabulary* and uses personalized PageRank teleportation

In [13]:
# Compare BiasedTextRank and TopicalPageRank on the same security focus
biased = BiasedTextRank(
    focus_terms=["security", "privacy", "encryption", "authentication"],
    bias_weight=5.0,
    top_n=5,
    language="en"
)
tpr = TopicalPageRank(
    topic_weights=topic_weights,
    min_weight=0.0,
    top_n=5,
    language="en"
)

biased_result = biased.extract_keywords(tech_article)
tpr_result = tpr.extract_keywords(tech_article)

print(f"{'BiasedTextRank':<35} {'TopicalPageRank':<35}")
print("=" * 70)
for i in range(5):
    b = biased_result.phrases[i].text if i < len(biased_result.phrases) else ""
    t = tpr_result.phrases[i].text if i < len(tpr_result.phrases) else ""
    print(f"{b:<35} {t:<35}")

BiasedTextRank                      TopicalPageRank                    
balance user experience             Encryption protects data           
Encryption protects data            balance user experience            
complicate data freshness           complicate data freshness          
careful data                        mobile users                       
sensitive user                      sensitive user                     


## 5. MultipartiteRank

Based on [Boudin (2018)](https://aclanthology.org/N18-2105/), MultipartiteRank extends TopicRank by keeping individual candidates as graph nodes instead of collapsing topics.

**Key parameters:**
- `similarity_threshold`: Jaccard similarity threshold for clustering candidates into topics (default `0.26`)
- `alpha`: Position boost strength — higher values give more weight to first-occurring variants (default `1.1`, `0` disables)

**How it works:**
1. Candidates are clustered into topics (same HAC clustering as TopicRank)
2. A k-partite graph is built: edges connect candidates from **different** topics only
3. Edge weights are inversely proportional to the positional gap between candidates
4. An alpha adjustment **boosts incoming edges** to the first-occurring variant in each topic
5. PageRank is run on this modified graph

**How it differs from TopicRank:**
- **TopicRank** collapses each topic into a single node and ranks topics as a whole
- **MultipartiteRank** keeps every candidate as its own node but removes intra-topic edges, preserving fine-grained distinctions while preventing intra-topic competition

**Best for:** Multi-topic documents where you want topic-aware ranking with positional preference for earlier mentions.

In [14]:
# MultipartiteRank on the same NLP text
mpr = MultipartiteRank(
    similarity_threshold=0.26,
    alpha=1.1,
    top_n=10,
    language="en"
)

result = mpr.extract_keywords(text)

print("MultipartiteRank Results:")
print("=" * 50)
for p in result.phrases:
    print(f"{p.rank:>2}. {p.text:<35} {p.score:.4f}")

MultipartiteRank Results:
 1. NLP                                 0.1416
 2. Natural language                    0.1216
 3. computers                           0.1012
 4. field                               0.0447
 5. artificial intelligence             0.0443
 6. interaction                         0.0415
 7. humans                              0.0411
 8. focuses                             0.0393
 9. ultimate                            0.0311
10. translation                         0.0277


### Effect of `alpha`

The `alpha` parameter controls the position boost for first-occurring variants in each topic cluster. Setting `alpha=0` disables the boost entirely:

In [15]:
# Compare different alpha values
alphas = [0.0, 0.5, 1.1, 2.0]

print("Alpha Comparison (similarity_threshold=0.26)")
print("=" * 80)

for alpha in alphas:
    mpr = MultipartiteRank(
        similarity_threshold=0.26,
        alpha=alpha,
        top_n=3,
        language="en"
    )
    result = mpr.extract_keywords(text)

    print(f"\nalpha={alpha}:")
    for p in result.phrases:
        print(f"  {p.text}: {p.score:.4f}")

Alpha Comparison (similarity_threshold=0.26)

alpha=0.0:
  translation: 0.0477
  summarization: 0.0470
  natural language: 0.0463

alpha=0.5:
  NLP: 0.0942
  Natural language: 0.0842
  computers: 0.0633

alpha=1.1:
  NLP: 0.1416
  Natural language: 0.1216
  computers: 0.1012

alpha=2.0:
  NLP: 0.1823
  computers: 0.1589
  Natural language: 0.1526


### MultipartiteRank vs BaseTextRank: Side-by-Side

Let's compare MultipartiteRank with BaseTextRank to see how topic-aware graph construction affects results:

In [16]:
# Compare BaseTextRank and MultipartiteRank on the same text
base = BaseTextRank(top_n=5, language="en")
mpr = MultipartiteRank(similarity_threshold=0.26, alpha=1.1, top_n=5, language="en")

base_result = base.extract_keywords(text)
mpr_result = mpr.extract_keywords(text)

print(f"{'BaseTextRank':<35} {'MultipartiteRank':<35}")
print("=" * 70)
for i in range(5):
    b = base_result.phrases[i].text if i < len(base_result.phrases) else ""
    m = mpr_result.phrases[i].text if i < len(mpr_result.phrases) else ""
    print(f"{b:<35} {m:<35}")

BaseTextRank                        MultipartiteRank                   
generate human language             NLP                                
NLP tasks                           Natural language                   
NLP                                 computers                          
Natural language                    field                              
enable computers                    artificial intelligence            


## JSON Batch API

For processing large volumes of pre-tokenized documents, use the JSON interface:

- `extract_from_json()` - Single document
- `extract_batch_from_json()` - Multiple documents

This is particularly useful when:
- You've already tokenized with spaCy or another NLP library
- You're processing many documents in batch
- You need fine-grained control over tokenization

In [17]:
import json
from rapid_textrank import extract_from_json

# Single document with pre-tokenized input
doc = {
    "tokens": [
        {"text": "Machine", "lemma": "machine", "pos": "NOUN",
         "start": 0, "end": 7, "sentence_idx": 0, "token_idx": 0, "is_stopword": False},
        {"text": "learning", "lemma": "learning", "pos": "NOUN",
         "start": 8, "end": 16, "sentence_idx": 0, "token_idx": 1, "is_stopword": False},
        {"text": "is", "lemma": "be", "pos": "AUX",
         "start": 17, "end": 19, "sentence_idx": 0, "token_idx": 2, "is_stopword": True},
        {"text": "transforming", "lemma": "transform", "pos": "VERB",
         "start": 20, "end": 32, "sentence_idx": 0, "token_idx": 3, "is_stopword": False},
        {"text": "industries", "lemma": "industry", "pos": "NOUN",
         "start": 33, "end": 43, "sentence_idx": 0, "token_idx": 4, "is_stopword": False},
    ],
    "config": {
        "top_n": 5,
        "window_size": 3,
        "damping": 0.85
    }
}

result_json = extract_from_json(json.dumps(doc))
result = json.loads(result_json)

print("Single document result:")
for phrase in result["phrases"]:
    print(f"  {phrase['text']}: {phrase['score']:.4f}")


Single document result:
  Machine learning: 0.5000
  industries: 0.2048


In [18]:
from rapid_textrank import extract_batch_from_json

# Batch processing multiple documents
docs = [
    {
        "tokens": [
            {"text": "Deep", "lemma": "deep", "pos": "ADJ",
             "start": 0, "end": 4, "sentence_idx": 0, "token_idx": 0, "is_stopword": False},
            {"text": "learning", "lemma": "learning", "pos": "NOUN",
             "start": 5, "end": 13, "sentence_idx": 0, "token_idx": 1, "is_stopword": False},
            {"text": "models", "lemma": "model", "pos": "NOUN",
             "start": 14, "end": 20, "sentence_idx": 0, "token_idx": 2, "is_stopword": False},
        ],
        "config": {"top_n": 3}
    },
    {
        "tokens": [
            {"text": "Neural", "lemma": "neural", "pos": "ADJ",
             "start": 0, "end": 6, "sentence_idx": 0, "token_idx": 0, "is_stopword": False},
            {"text": "networks", "lemma": "network", "pos": "NOUN",
             "start": 7, "end": 15, "sentence_idx": 0, "token_idx": 1, "is_stopword": False},
            {"text": "process", "lemma": "process", "pos": "VERB",
             "start": 16, "end": 23, "sentence_idx": 0, "token_idx": 2, "is_stopword": False},
            {"text": "data", "lemma": "data", "pos": "NOUN",
             "start": 24, "end": 28, "sentence_idx": 0, "token_idx": 3, "is_stopword": False},
        ],
        "config": {"top_n": 3}
    }
]

results_json = extract_batch_from_json(json.dumps(docs))
results = json.loads(results_json)

print("Batch results:")
for i, result in enumerate(results):
    print(f"\nDocument {i}:")
    for phrase in result["phrases"]:
        print(f"  - {phrase['text']} ({phrase['score']:.4f})")

Batch results:

Document 0:
  - Deep learning models (1.0000)

Document 1:
  - Neural networks (0.5000)
  - data (0.2048)


## Decision Guide: Which Variant to Use?

```
                                START
                                  │
                                  ▼
                    ┌─────────────────────────┐
                    │ Do you have specific    │
                    │ topics to focus on?     │
                    └─────────────────────────┘
                         │              │
                        YES             NO
                         │              │
                         ▼              ▼
              ┌────────────────────┐  ┌─────────────────────────┐
              │ Do you have graded │  │ Is key info at the     │
              │ topic weights      │  │ beginning of the doc?   │
              │ (e.g. from LDA)?   │  └─────────────────────────┘
              └────────────────────┘       │              │
                   │            │         YES             NO
                  YES           NO         │              │
                   │            │          ▼              ▼
                   ▼            ▼  ┌──────────────┐ ┌──────────────────────────┐
          ┌────────────────┐ ┌──────────────┐      │ PositionRank │ │ Multiple topics with     │
          │TopicalPageRank │ │BiasedTextRank│      └──────────────┘ │ repeated candidates?     │
          └────────────────┘ └──────────────┘                       └──────────────────────────┘
                                                                         │              │
                                                                        YES             NO
                                                                         │              │
                                                                         ▼              ▼
                                                                 ┌──────────────────┐ ┌──────────────┐
                                                                 │MultipartiteRank  │ │ BaseTextRank │
                                                                 └──────────────────┘ └──────────────┘
```

### Recommendations by Document Type

| Document Type | Recommended Variant | Why |
|---------------|---------------------|-----|
| Blog posts, articles | BaseTextRank | General content, no position bias needed |
| Academic papers | PositionRank | Key terms in title/abstract |
| News articles | PositionRank | Lead paragraphs contain key info |
| Product reviews | BiasedTextRank | Focus on features you care about |
| Support tickets | BiasedTextRank | Focus on problem categories |
| Legal documents | BaseTextRank | Important terms throughout |
| Domain corpora with LDA | TopicalPageRank | Graded topic weights from topic models |
| Taxonomy-guided extraction | TopicalPageRank | Weight terms by domain vocabulary importance |
| Multi-topic documents | MultipartiteRank | Topic-aware with positional preference |
| Documents with repeated terminology | MultipartiteRank | Deduplicates via topic clustering, boosts first mention |

## Next Steps

- **[03_explain_algorithm.ipynb](03_explain_algorithm.ipynb)** - Visual explanation of how TextRank works internally
- **[04_benchmarks.ipynb](04_benchmarks.ipynb)** - Performance comparison with pytextrank