
# Graph RAG Showcase — Chicago Bulls Finals

This notebook demonstrates **why graph-based retrieval matters** using your Neo4j graph built from the 1997 NBA Finals (Bulls–Jazz).

**What you'll see:**
1. *Clutch role-player makes assisted by Jordan* (precision via relationships).
2. *Sequence logic:* defensive stop → go-ahead basket in final minute.
3. *Lead-change moments* discovered via event chains and score margin.
4. *Assist chains in clutch time* (who enabled whom).
5. *Narrative path for Game 6 (1997):* last 30 seconds as a traversable chain.


In [7]:

# If needed:
# %pip install pandas neo4j python-dotenv matplotlib

import os
from pathlib import Path
import pandas as pd
from neo4j import GraphDatabase

# Locate repo root and import config.py (assumes this notebook lives in notebooks/)
import sys
repo_root = Path("..").resolve()
sys.path.append(str(repo_root))
import config

NEO4J_URI = config.NEO4J_URI
NEO4J_USER = config.NEO4J_USER
NEO4J_PASSWORD = config.NEO4J_PASSWORD

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
driver.verify_connectivity()
print("Connected to Neo4j ✅")


Connected to Neo4j ✅


In [8]:

def cypher_df(query: str, params: dict | None = None) -> pd.DataFrame:
    with driver.session() as session:
        data = session.run(query, params or {}).data()
    if not data:
        return pd.DataFrame()
    return pd.DataFrame(data)


### Debug

In [9]:
cypher_df("MATCH ()-[r]->() RETURN type(r) AS type, count(*) AS cnt ORDER BY cnt DESC")

Unnamed: 0,type,cnt
0,PERFORMED,3008
1,IN_GAME,2608
2,NEXT,2602


In [10]:
cypher_df("MATCH ()-[r:PERFORMED]->() RETURN count(r) AS performed_count")


Unnamed: 0,performed_count
0,3008


## 1) Clutch role-player makes assisted by Jordan (Finals)

In [11]:

query = '''
MATCH (e:Event {is_clutch: true, event_type: 1})-[:IN_GAME]->(g:Game)
MATCH (assister:Player {name: "Michael Jordan"})-[:PERFORMED]->(e)
MATCH (scorer:Player)-[:PERFORMED {role:"PLAYER1_ID"}]->(e)
WHERE scorer.name <> "Michael Jordan"
  AND EXISTS {
    MATCH (assister)-[r:PERFORMED]->(e)
    WHERE r.role IN ["PLAYER2_ID","PLAYER3_ID"]
  }
RETURN g.game_id AS game,
       scorer.name AS scorer,
       assister.name AS assister,
       e.period AS period,
       e.seconds_left_period AS sec_left,
       e.score AS score,
       e.score_margin AS margin,
       e.is_clutch,
       coalesce(e.home_desc, e.visit_desc) AS desc
ORDER BY game, sec_left
'''
df1 = cypher_df(query)
df1


Unnamed: 0,game,scorer,assister,period,sec_left,score,margin,e.is_clutch,desc
0,49600088,Steve Kerr,Michael Jordan,4,5,86 - 88,2.0,True,Kerr 14' Jump Shot (9 PTS) (Jordan 4 AST)



**Why this matters:** vanilla RAG cannot *reliably* answer "find clutch shots **assisted by Jordan** and **scored by someone else**" without fragile text heuristics. The graph uses **explicit relationships** (`PERFORMED` roles + `IN_GAME`) to retrieve the correct events deterministically.


## 2) Sequence logic: Defensive rebound → go-ahead make in final minute

In [12]:

query = '''
// Defensive rebound (event_type=4) immediately followed by a made shot (event_type=1) in clutch window
MATCH (r:Event {event_type: 4})-[:NEXT]->(m:Event {event_type: 1, is_clutch: true})
MATCH (r)-[:IN_GAME]->(g:Game)
RETURN g.game_id AS game,
       r.period  AS period,
       r.seconds_left_period AS sec_left_before,
       coalesce(r.home_desc, r.visit_desc) AS rebound_desc,
       m.seconds_left_period AS sec_left_shot,
       coalesce(m.home_desc, m.visit_desc) AS make_desc,
       m.score AS score_after,
       m.score_margin AS margin_after
ORDER BY game, sec_left_shot
LIMIT 20
'''
df2 = cypher_df(query)
df2


Unnamed: 0,game,period,sec_left_before,rebound_desc,sec_left_shot,make_desc,score_after,margin_after
0,49600087,4,42,,25,,88 - 85,-3.0



**Why this matters:** *sequence* questions are where graphs shine. We traverse `(:Event)-[:NEXT]->(:Event)` to reason about **causality-like chains** (stop → score) that are brittle in flat text search.


## 3) Lead-change moments via score margin swing

In [13]:

# Find made shots that flip the sign of score_margin compared to the previous event
query = '''
MATCH (prev:Event)-[:NEXT]->(e:Event {event_type: 1})
WHERE prev.game_id = e.game_id
  AND prev.score_margin IS NOT NULL AND e.score_margin IS NOT NULL
  AND prev.period = e.period
  AND sign(prev.score_margin) <> sign(e.score_margin)
MATCH (e)-[:IN_GAME]->(g:Game)
OPTIONAL MATCH (scorer:Player)-[:PERFORMED {role:"PLAYER1_ID"}]->(e)
RETURN g.game_id AS game,
       e.period AS period,
       e.seconds_left_period AS sec_left,
       scorer.name AS scorer,
       prev.score AS score_before,
       e.score AS score_after,
       prev.score_margin AS margin_before,
       e.score_margin AS margin_after,
       coalesce(e.home_desc, e.visit_desc) AS desc
ORDER BY game, sec_left
LIMIT 25
'''
df3 = cypher_df(query)
df3



**Why this matters:** we’re mixing **properties** (score margins) with **structure** (`NEXT`) to detect *lead changes* directly—no NLP required.


## 4) Assist chains in clutch time (who enabled whom)

In [14]:

query = '''
MATCH (e:Event {is_clutch: true, event_type: 1})-[:IN_GAME]->(g:Game)
MATCH (scorer:Player)-[:PERFORMED {role:"PLAYER1_ID"}]->(e)
OPTIONAL MATCH (assister:Player)-[:PERFORMED {role:"PLAYER2_ID"}]->(e)
RETURN g.game_id AS game,
       e.period AS period,
       e.seconds_left_period AS sec_left,
       scorer.name AS scorer,
       assister.name AS assister,
       e.score AS score,
       e.score_margin AS margin,
       coalesce(e.home_desc, e.visit_desc) AS desc
ORDER BY game, sec_left
'''
df4 = cypher_df(query)
df4


Unnamed: 0,game,period,sec_left,scorer,assister,score,margin,desc
0,49600083,4,0,Michael Jordan,,82 - 84,2.0,Jordan 19' Jump Shot (31 PTS)
1,49600087,4,6,Luc Longley,Toni Kukoc,90 - 87,-3.0,
2,49600087,4,15,Greg Ostertag,John Stockton,88 - 87,-1.0,Ostertag Layup (13 PTS) (Stockton 5 AST)
3,49600087,4,25,Michael Jordan,Scottie Pippen,88 - 85,-3.0,
4,49600088,4,5,Steve Kerr,Michael Jordan,86 - 88,2.0,Kerr 14' Jump Shot (9 PTS) (Jordan 4 AST)



**Why this matters:** explicit **role-typed edges** let us ask *who enabled whom* at decisive moments—a natural fit for **knowledge graphs** and Graph RAG.


## 5) Narrative path — final 30s of Game 6 (1997)

In [15]:

# from above the game id of interest is 49600088
game_id = '49600088'

q_path = '''
MATCH (start:Event {is_clutch: true})-[:IN_GAME]->(g:Game {game_id: $gid})
WITH g, start
ORDER BY start.seconds_left_period DESC
LIMIT 1
CALL {
  WITH start
  MATCH p = (start)-[:NEXT*0..20]->(e:Event)
  WHERE e.is_clutch = true OR e.seconds_left_period <= start.seconds_left_period
  RETURN p
  ORDER BY length(p) DESC
  LIMIT 1
}
UNWIND nodes(p) AS ev
RETURN ev.event_id AS event_id,
       ev.period AS period,
       ev.seconds_left_period AS sec_left,
       ev.event_type AS type,
       coalesce(ev.home_desc, ev.visit_desc) AS desc,
       ev.score AS score,
       ev.score_margin AS margin
ORDER BY period DESC, sec_left
'''
df5 = cypher_df(q_path, {"gid": game_id}) if game_id else pd.DataFrame()
df5




Unnamed: 0,event_id,period,sec_left,type,desc,score,margin
0,49600088_460,4,0,1,Kukoc Layup (9 PTS) (Pippen 2 AST),86 - 90,4.0
1,49600088_461,4,0,13,,86 - 90,4.0
2,49600088_459,4,4,5,Pippen STEAL (2 STL),,2.0
3,49600088_456,4,5,1,Kerr 14' Jump Shot (9 PTS) (Jordan 4 AST),86 - 88,2.0
4,49600088_457,4,5,9,,,2.0
5,49600088_458,4,5,7,BULLS Violation: Delay of game Violation,,2.0
6,49600088_453,4,28,2,,,3.0
7,49600088_454,4,28,4,Rodman REBOUND (Off:3 Def:8),,3.0
8,49600088_455,4,28,9,BULLS Timeout: Regular (Full 5 Short 0),,3.0



This gives you an ordered **event chain** for the decisive window. You can render it as a mini timeline in your post (screenshot from the notebook).



---

### Talking points you can reuse in your LinkedIn post

- **Flat RAG retrieves paragraphs. Graph RAG retrieves *relationships* and *sequences*.**  
- Sequence reasoning (`NEXT`) and role-typed participation (`PERFORMED {role: ...}`) let us answer **coaching-level questions** (who enabled whom, what led to what) that are brittle in pure text.  
- The result is **precise retrieval** for the *facts* plus **narrative-ready context**—a perfect spine for a hybrid RAG system if you want to extend it later.


In [None]:

driver.close()
print("Done. ✅")
