# RST Trap Finder - Basic Usage Tutorial

This notebook demonstrates the basic functionality of the RST Trap Finder toolkit for analyzing word association graphs to identify trap words.

## Setup

First, let's import the necessary modules and load our sample data.

In [None]:
import sys
sys.path.append('../src')

from rst_trap_finder import TRAP_LETTERS
from rst_trap_finder.io import load_csv
from rst_trap_finder.scores import (
    one_step_rst_prob, escape_hardness, biased_pagerank,
    k_step_rst_prob, minimax_topm, composite
)
from rst_trap_finder.strategy import recommend_next
from rst_trap_finder.data_processing import DataProcessor

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print(f"Trap letters: {TRAP_LETTERS}")

## Loading and Exploring Data

Let's load the sample word association graph and explore its structure.

In [None]:
# Load the sample graph
graph = load_csv('../data/edges.sample.csv')

print(f"Graph loaded with {len(graph)} nodes")
print("\nFirst 5 nodes and their connections:")
for i, (node, connections) in enumerate(graph.items()):
    if i >= 5:
        break
    print(f"{node}: {dict(connections)}")

In [None]:
# Get comprehensive data summary
summary = DataProcessor.get_data_summary(graph)

print("Graph Summary:")
for key, value in summary.items():
    print(f"{key}: {value}")

## Basic Scoring Functions

Let's compute the basic trap scores for all words in our graph.

In [None]:
# Compute PageRank with bias toward trap letters
pagerank = biased_pagerank(graph, TRAP_LETTERS, alpha=1.5)

# Compute scores for all nodes
results = []

for word in graph:
    scores = {
        'word': word,
        'one_step': one_step_rst_prob(word, graph, TRAP_LETTERS),
        'escape_hardness': escape_hardness(word, graph, TRAP_LETTERS),
        'pagerank': pagerank.get(word, 0.0),
        'k2_step': k_step_rst_prob(word, graph, TRAP_LETTERS, k=2),
        'minimax': minimax_topm(word, graph, TRAP_LETTERS),
        'is_trap': word and word[0] in TRAP_LETTERS
    }
    
    # Compute composite score
    scores['composite'] = composite(word, graph, TRAP_LETTERS, pagerank)
    
    results.append(scores)

# Convert to DataFrame for easier analysis
df = pd.DataFrame(results)
print(f"Computed scores for {len(df)} words")
df.head()

## Top Trap Words

Let's find and analyze the most effective trap words.

In [None]:
# Sort by composite score and show top 10
top_words = df.nlargest(10, 'composite')

print("Top 10 Trap Words by Composite Score:")
print(top_words[['word', 'composite', 'one_step', 'escape_hardness', 'is_trap']].round(3))

In [None]:
# Compare trap vs non-trap words
trap_words = df[df['is_trap']]
non_trap_words = df[~df['is_trap']]

print("Score Comparison:")
print(f"Trap words (n={len(trap_words)}):")
print(f"  Mean composite score: {trap_words['composite'].mean():.3f}")
print(f"  Mean one-step prob: {trap_words['one_step'].mean():.3f}")

print(f"\nNon-trap words (n={len(non_trap_words)}):")
print(f"  Mean composite score: {non_trap_words['composite'].mean():.3f}")
print(f"  Mean one-step prob: {non_trap_words['one_step'].mean():.3f}")

## Visualization

Let's create some visualizations to better understand the score distributions.

In [None]:
# Score distribution plots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Composite score distribution
axes[0, 0].hist(df['composite'], bins=20, alpha=0.7, color='blue')
axes[0, 0].set_title('Composite Score Distribution')
axes[0, 0].set_xlabel('Composite Score')
axes[0, 0].set_ylabel('Frequency')

# One-step probability distribution by trap status
axes[0, 1].hist(trap_words['one_step'], bins=15, alpha=0.7, color='red', label='Trap words')
axes[0, 1].hist(non_trap_words['one_step'], bins=15, alpha=0.7, color='blue', label='Non-trap words')
axes[0, 1].set_title('One-Step RST Probability')
axes[0, 1].set_xlabel('One-Step Probability')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].legend()

# Scatter plot: One-step vs Composite
axes[1, 0].scatter(df['one_step'], df['composite'], 
                   c=['red' if trap else 'blue' for trap in df['is_trap']], alpha=0.6)
axes[1, 0].set_title('One-Step vs Composite Score')
axes[1, 0].set_xlabel('One-Step Probability')
axes[1, 0].set_ylabel('Composite Score')

# Top words bar chart
top_10 = df.nlargest(10, 'composite')
colors = ['red' if trap else 'blue' for trap in top_10['is_trap']]
axes[1, 1].bar(range(len(top_10)), top_10['composite'], color=colors)
axes[1, 1].set_title('Top 10 Words by Composite Score')
axes[1, 1].set_xlabel('Rank')
axes[1, 1].set_ylabel('Composite Score')
axes[1, 1].set_xticks(range(len(top_10)))
axes[1, 1].set_xticklabels(top_10['word'], rotation=45, ha='right')

plt.tight_layout()
plt.show()

## Strategy Recommendation

Let's see how to get strategic recommendations for the next word to play.

In [None]:
# Get recommendation from "color"
current_word = "color"
lambdas = (0.35, 0.2, 0.25, 0.1, 0.1)  # Default weights

try:
    recommendation = recommend_next(current_word, graph, TRAP_LETTERS, pagerank, lambdas)
    
    print(f"Starting from: {current_word}")
    print(f"Best recommendation: {recommendation['best']['word']}")
    print(f"Best score: {recommendation['best']['composite']:.3f}")
    
    print("\nTop 5 candidates:")
    for i, candidate in enumerate(recommendation['candidates'][:5]):
        print(f"{i+1}. {candidate['word']} (score: {candidate['composite']:.3f})")
        
except ValueError as e:
    print(f"Error: {e}")

## Exploring Different Starting Points

Let's analyze how recommendations change from different starting words.

In [None]:
# Test multiple starting words
test_words = ['start', 'animal', 'blue', 'rock']
recommendations = {}

for word in test_words:
    if word in graph:
        try:
            rec = recommend_next(word, graph, TRAP_LETTERS, pagerank, lambdas)
            recommendations[word] = rec['best']
        except ValueError:
            recommendations[word] = None

print("Recommendations from different starting words:")
for start_word, best_rec in recommendations.items():
    if best_rec:
        print(f"{start_word} -> {best_rec['word']} (score: {best_rec['composite']:.3f})")
    else:
        print(f"{start_word} -> No valid recommendations")

## Parameter Sensitivity Analysis

Let's see how changing the scoring parameters affects the results.

In [None]:
# Test different PageRank alpha values
alpha_values = [1.0, 1.5, 2.0, 3.0]
alpha_results = []

for alpha in alpha_values:
    pr = biased_pagerank(graph, TRAP_LETTERS, alpha=alpha)
    
    # Compute average scores for trap vs non-trap words
    trap_scores = [composite(word, graph, TRAP_LETTERS, pr) 
                   for word in graph if word and word[0] in TRAP_LETTERS]
    non_trap_scores = [composite(word, graph, TRAP_LETTERS, pr) 
                       for word in graph if not (word and word[0] in TRAP_LETTERS)]
    
    alpha_results.append({
        'alpha': alpha,
        'trap_mean': np.mean(trap_scores),
        'non_trap_mean': np.mean(non_trap_scores),
        'separation': np.mean(trap_scores) - np.mean(non_trap_scores)
    })

alpha_df = pd.DataFrame(alpha_results)
print("Effect of PageRank alpha parameter:")
print(alpha_df.round(3))

In [None]:
# Plot alpha sensitivity
plt.figure(figsize=(10, 6))

plt.subplot(1, 2, 1)
plt.plot(alpha_df['alpha'], alpha_df['trap_mean'], 'ro-', label='Trap words')
plt.plot(alpha_df['alpha'], alpha_df['non_trap_mean'], 'bo-', label='Non-trap words')
plt.xlabel('Alpha Parameter')
plt.ylabel('Mean Composite Score')
plt.title('Score by Alpha Parameter')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(alpha_df['alpha'], alpha_df['separation'], 'go-')
plt.xlabel('Alpha Parameter')
plt.ylabel('Score Separation')
plt.title('Trap vs Non-Trap Separation')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Individual Word Analysis

Let's do a detailed analysis of a specific word.

In [None]:
# Analyze a specific word in detail
word_to_analyze = "start"

if word_to_analyze in graph:
    print(f"Detailed Analysis of '{word_to_analyze}':")
    print(f"One-step RST probability: {one_step_rst_prob(word_to_analyze, graph, TRAP_LETTERS):.3f}")
    print(f"Escape hardness: {escape_hardness(word_to_analyze, graph, TRAP_LETTERS):.3f}")
    print(f"PageRank score: {pagerank.get(word_to_analyze, 0.0):.6f}")
    print(f"2-step RST probability: {k_step_rst_prob(word_to_analyze, graph, TRAP_LETTERS, k=2):.3f}")
    print(f"Minimax score: {minimax_topm(word_to_analyze, graph, TRAP_LETTERS):.3f}")
    print(f"Composite score: {composite(word_to_analyze, graph, TRAP_LETTERS, pagerank):.3f}")
    
    print(f"\nOutgoing connections from '{word_to_analyze}':")
    connections = graph[word_to_analyze]
    sorted_connections = sorted(connections.items(), key=lambda x: x[1], reverse=True)
    
    for target, weight in sorted_connections:
        is_trap = target and target[0] in TRAP_LETTERS
        trap_indicator = "(TRAP)" if is_trap else ""
        print(f"  -> {target}: {weight:.1f} {trap_indicator}")
else:
    print(f"Word '{word_to_analyze}' not found in graph")

## Conclusion

This notebook demonstrated the basic usage of RST Trap Finder:

1. **Data Loading**: How to load and explore word association graphs
2. **Scoring**: Computing various trap effectiveness metrics
3. **Analysis**: Comparing trap vs non-trap words
4. **Strategy**: Getting recommendations for next moves
5. **Parameter Tuning**: Understanding how parameters affect results

### Key Insights from Sample Data:
- Words starting with R/S/T don't automatically score highest (the composite metric considers multiple factors)
- The PageRank bias parameter significantly affects trap/non-trap separation
- Strategic recommendations depend on the specific graph structure and weights

### Next Steps:
- Explore the advanced analysis notebook for machine learning features
- Try the visualization notebook for interactive plots
- Use your own word association data to find domain-specific trap words

For more advanced features, see:
- `advanced_analysis.ipynb`: ML models and parameter optimization
- `visualization_guide.ipynb`: Interactive visualizations
- `performance_optimization.ipynb`: Large-scale processing