# Connecting the Dots: Interactive Knowledge Web
## Mini-Geo Journey 1: Theories of Space and Place

**Author:** Xihan Yao  
**Course:** Graduate Geography  
**Assignment:** First Mini-Geo Journey - Space/Place Theory Engagement  
**Date:** September 2025

---

### About This Notebook

Welcome to my **interactive Mini-Geo Journey** exploring how theories of space and place shape my intellectual development and research trajectory toward understanding built-environment-personality relationships.

This notebook demonstrates my theoretical engagement by:
- 📚 **Reading Analysis**: Parsing and visualizing my comprehensive bibliography across three research domains
- 🕸️ **Knowledge Networks**: Creating interactive visualizations showing connections between different papers and theories
- 🎯 **Research Roadmap**: Mapping my intellectual journey from foundational spatial theory to contemporary GeoAI applications
- 🔍 **Interactive Exploration**: Allowing detailed examination of individual papers and their relationships

**Research Question**: How do foundational theories of space and place inform contemporary built-environment-personality research through GeoAI methodologies?

In [2]:
# Import required libraries
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import networkx as nx
from bs4 import BeautifulSoup
import os
from pathlib import Path
import re
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Try to import interactive widgets
try:
    import ipywidgets as widgets
    from IPython.display import display, HTML
    WIDGETS_AVAILABLE = True
except ImportError:
    WIDGETS_AVAILABLE = False
    print("IPython widgets not available. Some interactive features may be limited.")

print("✅ Libraries imported successfully!")
try:
    import plotly
    print(f"📊 Plotly version: {plotly.__version__}")
except AttributeError:
    print("📊 Plotly: Available")
print(f"🕸️ NetworkX version: {nx.__version__}")
print(f"🔧 Interactive widgets: {'Available' if WIDGETS_AVAILABLE else 'Limited'}")

✅ Libraries imported successfully!
📊 Plotly version: 6.3.0
🕸️ NetworkX version: 3.5
🔧 Interactive widgets: Available


In [3]:
# Load and Parse HTML Bibliography Data
def parse_html_bibliography(file_path):
    """Parse HTML bibliography file and extract paper information"""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            content = f.read()
        
        soup = BeautifulSoup(content, 'html.parser')
        papers = []
        
        # Find all bibliography entries
        entries = soup.find_all('div', class_='csl-entry')
        
        for i, entry in enumerate(entries):
            text = entry.get_text(strip=True)
            links = entry.find_all('a')
            
            # Extract basic info with regex patterns
            # Author pattern (before year or title)
            author_match = re.match(r'^([^.]+?\.)', text)
            author = author_match.group(1) if author_match else "Unknown"
            
            # Year pattern
            year_match = re.search(r'\((\d{4})\)', text)
            year = int(year_match.group(1)) if year_match else None
            
            # Title pattern (usually in quotes or after author)
            title_match = re.search(r'["""]([^"""]+)["""]', text)
            if not title_match:
                # Alternative: text after period and before journal/venue
                title_match = re.search(r'\. ([^.]+?)\. ', text)
            title = title_match.group(1) if title_match else text[:100] + "..."
            
            # Journal/venue pattern
            venue_match = re.search(r'<i>([^<]+)</i>', str(entry))
            venue = venue_match.group(1) if venue_match else "Unknown Venue"
            
            # DOI/URL
            url = links[0]['href'] if links else None
            
            papers.append({
                'id': i,
                'title': title.strip(),
                'author': author.strip(),
                'year': year,
                'venue': venue.strip(),
                'url': url,
                'full_citation': text,
                'domain': Path(file_path).stem  # GeoAI, GeoPersonality, or IssuesGeo
            })
        
        return papers
    
    except Exception as e:
        print(f"Error parsing {file_path}: {e}")
        return []

# Load all bibliography files
files_dir = Path("files/papers/")
bibliography_files = {
    'GeoAI': files_dir / 'GeoAI.html',
    'GeoPersonality': files_dir / 'GeoPersonality.html', 
    'IssuesGeo': files_dir / 'IssuesGeo.html'
}

all_papers = []
for domain, file_path in bibliography_files.items():
    if file_path.exists():
        papers = parse_html_bibliography(file_path)
        all_papers.extend(papers)
        print(f"📚 Loaded {len(papers)} papers from {domain}")
    else:
        print(f"⚠️ File not found: {file_path}")

# Convert to DataFrame for analysis
df_papers = pd.DataFrame(all_papers)
print(f"\n✅ Total papers loaded: {len(df_papers)}")
print(f"📊 Domains: {df_papers['domain'].value_counts().to_dict()}")

# Display sample data
print("\n📋 Sample Papers:")
df_papers.head(3)[['title', 'author', 'year', 'domain']].head()

📚 Loaded 6 papers from GeoAI
📚 Loaded 29 papers from GeoPersonality
📚 Loaded 7 papers from IssuesGeo

✅ Total papers loaded: 42
📊 Domains: {'GeoPersonality': 29, 'IssuesGeo': 7, 'GeoAI': 6}

📋 Sample Papers:


Unnamed: 0,title,author,year,domain
0,"Brown, C. F., Kazmierski, M. R., Pasquarella, ...","Brown, C.",2025.0,GeoAI
1,G,"Goodchild, M.",2004.0,GeoAI
2,(2021),"Goodchild, M.",2021.0,GeoAI


In [4]:
# Create Knowledge Network Analysis
def create_knowledge_network(df):
    """Create a network graph showing relationships between papers and concepts"""
    G = nx.Graph()
    
    # Define research domains and their characteristics
    domain_info = {
        'GeoAI': {
            'color': '#007cba',
            'description': 'AI applications in geography and spatial analysis',
            'keywords': ['AI', 'machine learning', 'spatial', 'GeoAI', 'remote sensing']
        },
        'GeoPersonality': {
            'color': '#4CAF50', 
            'description': 'Geographic variation in personality and behavior',
            'keywords': ['personality', 'psychology', 'behavior', 'regional', 'cultural']
        },
        'IssuesGeo': {
            'color': '#FF9800',
            'description': 'Critical issues and challenges in geography',
            'keywords': ['critical', 'social', 'political', 'justice', 'methodology']
        }
    }
    
    # Add nodes for each paper
    for _, paper in df.iterrows():
        domain = paper['domain']
        G.add_node(
            paper['id'],
            title=paper['title'],
            author=paper['author'],
            year=paper['year'],
            venue=paper['venue'],
            domain=domain,
            color=domain_info[domain]['color'],
            size=20,
            citation=paper['full_citation'],
            url=paper['url']
        )
    
    # Add domain nodes
    for domain, info in domain_info.items():
        domain_papers = df[df['domain'] == domain]
        G.add_node(
            f"domain_{domain}",
            title=f"{domain} Research Domain",
            description=info['description'],
            type='domain',
            color=info['color'],
            size=40,
            paper_count=len(domain_papers)
        )
        
        # Connect domain node to all papers in that domain
        for _, paper in domain_papers.iterrows():
            G.add_edge(f"domain_{domain}", paper['id'])
    
    # Add cross-domain connections based on shared concepts
    # This is a simplified approach - in reality, you might use more sophisticated NLP
    for i, paper1 in df.iterrows():
        for j, paper2 in df.iterrows():
            if i >= j or paper1['domain'] == paper2['domain']:
                continue
                
            # Simple keyword matching for connections
            title1_words = set(paper1['title'].lower().split())
            title2_words = set(paper2['title'].lower().split())
            
            # Check for common academic keywords
            academic_keywords = {'urban', 'spatial', 'geographic', 'environment', 'analysis', 'data', 'model'}
            common_academic = title1_words.intersection(title2_words).intersection(academic_keywords)
            
            if len(common_academic) > 0:
                G.add_edge(paper1['id'], paper2['id'], weight=len(common_academic))
    
    return G, domain_info

# Create the network
network_graph, domain_info = create_knowledge_network(df_papers)

print(f"🕸️ Knowledge Network Created:")
print(f"   • Nodes: {network_graph.number_of_nodes()}")
print(f"   • Edges: {network_graph.number_of_edges()}")
print(f"   • Domains: {len(domain_info)}")

# Calculate network metrics
degree_centrality = nx.degree_centrality(network_graph)
betweenness_centrality = nx.betweenness_centrality(network_graph)

print(f"\n📊 Network Analysis:")
print(f"   • Average degree: {np.mean(list(degree_centrality.values())):.3f}")
print(f"   • Most connected nodes: {sorted(degree_centrality.items(), key=lambda x: x[1], reverse=True)[:3]}")

🕸️ Knowledge Network Created:
   • Nodes: 32
   • Edges: 42
   • Domains: 3

📊 Network Analysis:
   • Average degree: 0.085
   • Most connected nodes: [('domain_GeoPersonality', 0.9354838709677419), ('domain_IssuesGeo', 0.22580645161290322), ('domain_GeoAI', 0.1935483870967742)]


In [5]:
# Create Interactive Network Visualization
def create_interactive_network(G, df_papers):
    """Create an interactive network visualization using Plotly"""
    
    # Calculate layout positions
    pos = nx.spring_layout(G, k=3, iterations=50, seed=42)
    
    # Prepare node traces
    node_traces = {}
    
    for domain in ['GeoAI', 'GeoPersonality', 'IssuesGeo']:
        domain_papers = df_papers[df_papers['domain'] == domain]
        node_traces[domain] = {
            'x': [],
            'y': [],
            'text': [],
            'customdata': [],
            'hovertemplate': []
        }
        
        for _, paper in domain_papers.iterrows():
            if paper['id'] in pos:
                x, y = pos[paper['id']]
                node_traces[domain]['x'].append(x)
                node_traces[domain]['y'].append(y)
                
                # Create hover text
                hover_text = f"<b>{paper['title']}</b><br>"
                hover_text += f"Author: {paper['author']}<br>"
                hover_text += f"Year: {paper['year']}<br>"
                hover_text += f"Venue: {paper['venue']}<br>"
                hover_text += f"Domain: {paper['domain']}"
                
                node_traces[domain]['text'].append(paper['title'][:50] + "...")
                node_traces[domain]['customdata'].append({
                    'title': paper['title'],
                    'author': paper['author'],
                    'year': paper['year'],
                    'venue': paper['venue'],
                    'citation': paper['full_citation'],
                    'url': paper['url']
                })
                node_traces[domain]['hovertemplate'].append(hover_text)
    
    # Create edge traces
    edge_x = []
    edge_y = []
    
    for edge in G.edges():
        if edge[0] in pos and edge[1] in pos:
            x0, y0 = pos[edge[0]]
            x1, y1 = pos[edge[1]]
            edge_x.extend([x0, x1, None])
            edge_y.extend([y0, y1, None])
    
    # Create the plotly figure
    fig = go.Figure()
    
    # Add edges
    fig.add_trace(go.Scatter(
        x=edge_x, y=edge_y,
        line=dict(width=0.5, color='rgba(125,125,125,0.3)'),
        hoverinfo='none',
        mode='lines',
        name='Connections'
    ))
    
    # Add nodes for each domain
    colors = ['#007cba', '#4CAF50', '#FF9800']
    domains = ['GeoAI', 'GeoPersonality', 'IssuesGeo']
    
    for i, domain in enumerate(domains):
        if node_traces[domain]['x']:  # Only add if there are papers
            fig.add_trace(go.Scatter(
                x=node_traces[domain]['x'],
                y=node_traces[domain]['y'],
                mode='markers+text',
                marker=dict(
                    size=15,
                    color=colors[i],
                    opacity=0.8,
                    line=dict(width=2, color='white')
                ),
                text=node_traces[domain]['text'],
                textposition="middle center",
                textfont=dict(size=8, color='white'),
                customdata=node_traces[domain]['customdata'],
                hovertemplate='<b>%{customdata.title}</b><br>' +
                             'Author: %{customdata.author}<br>' +
                             'Year: %{customdata.year}<br>' +
                             'Venue: %{customdata.venue}<br>' +
                             '<extra></extra>',
                name=f'{domain} ({len(node_traces[domain]["x"])} papers)',
                showlegend=True
            ))
    
    # Update layout
    fig.update_layout(
        title=dict(
            text="Interactive Knowledge Network: My Reading Journey Across Three Domains",
            x=0.5,
            font=dict(size=20, color='#2c3e50')
        ),
        showlegend=True,
        hovermode='closest',
        margin=dict(b=20,l=5,r=5,t=40),
        annotations=[ dict(
            text="Click and drag to explore • Hover for details • Each color represents a research domain",
            showarrow=False,
            xref="paper", yref="paper",
            x=0.005, y=-0.002,
            xanchor='left', yanchor='bottom',
            font=dict(color='#7f8c8d', size=12)
        )],
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        plot_bgcolor='rgba(248,249,250,0.8)',
        height=700
    )
    
    return fig

# Create and display the interactive network
print("🎨 Creating interactive network visualization...")
fig_network = create_interactive_network(network_graph, df_papers)
fig_network.show()

print("✅ Interactive network created! Hover over nodes to see paper details.")

🎨 Creating interactive network visualization...


✅ Interactive network created! Hover over nodes to see paper details.


In [6]:
# Create Reading Timeline and Domain Analysis
def create_timeline_analysis(df):
    """Create temporal analysis of reading patterns"""
    
    # Filter out papers without years
    df_temporal = df.dropna(subset=['year'])
    
    # Create timeline visualization
    fig_timeline = px.scatter(
        df_temporal, 
        x='year', 
        y='domain',
        color='domain',
        size=[20]*len(df_temporal),  # Constant size
        hover_data=['title', 'author', 'venue'],
        color_discrete_map={
            'GeoAI': '#007cba',
            'GeoPersonality': '#4CAF50', 
            'IssuesGeo': '#FF9800'
        },
        title="Reading Timeline: Evolution of My Academic Interests"
    )
    
    fig_timeline.update_traces(
        marker=dict(
            opacity=0.8,
            line=dict(width=2, color='white')
        )
    )
    
    fig_timeline.update_layout(
        height=400,
        xaxis_title="Publication Year",
        yaxis_title="Research Domain",
        plot_bgcolor='rgba(248,249,250,0.8)'
    )
    
    return fig_timeline

def create_domain_summary(df):
    """Create summary statistics for each domain"""
    
    summary_data = []
    for domain in df['domain'].unique():
        domain_papers = df[df['domain'] == domain]
        
        summary_data.append({
            'Domain': domain,
            'Paper Count': len(domain_papers),
            'Year Range': f"{domain_papers['year'].min():.0f} - {domain_papers['year'].max():.0f}" if domain_papers['year'].notna().any() else "Various",
            'Key Authors': ', '.join(domain_papers['author'].str.split(',').str[0].value_counts().head(3).index.tolist()),
            'Recent Papers': len(domain_papers[domain_papers['year'] >= 2020]) if domain_papers['year'].notna().any() else 0
        })
    
    return pd.DataFrame(summary_data)

# Create timeline visualization
print("📅 Creating reading timeline...")
fig_timeline = create_timeline_analysis(df_papers)
fig_timeline.show()

# Create domain summary
print("📊 Creating domain analysis...")
domain_summary = create_domain_summary(df_papers)
print("\n🔍 Reading Summary by Domain:")
display(domain_summary)

📅 Creating reading timeline...


📊 Creating domain analysis...

🔍 Reading Summary by Domain:


Unnamed: 0,Domain,Paper Count,Year Range,Key Authors,Recent Papers
0,GeoAI,6,2004 - 2025,"Goodchild, Brown, Hu",3
1,GeoPersonality,29,2008 - 2025,"Götz, Rentfrow, Obschonka",16
2,IssuesGeo,7,2011 - 2025,"Cox, Fluri, GeostatsPy_bootstrap.",3


In [7]:
# Create Clickable Paper Explorer
def create_paper_explorer(df):
    """Create an interactive paper explorer with clickable details"""
    
    if not WIDGETS_AVAILABLE:
        print("📝 Interactive widgets not available. Displaying static paper list instead.")
        return df[['title', 'author', 'year', 'domain']].sort_values('year', ascending=False)
    
    # Create widgets for paper exploration
    domain_dropdown = widgets.Dropdown(
        options=['All'] + list(df['domain'].unique()),
        value='All',
        description='Domain:'
    )
    
    year_slider = widgets.IntRangeSlider(
        value=[df['year'].min(), df['year'].max()],
        min=int(df['year'].min()),
        max=int(df['year'].max()),
        step=1,
        description='Years:',
        disabled=False,
        continuous_update=False,
        orientation='horizontal',
        readout=True,
        readout_format='d'
    )
    
    search_box = widgets.Text(
        value='',
        placeholder='Search titles or authors...',
        description='Search:',
        disabled=False
    )
    
    output = widgets.Output()
    
    def update_display(*args):
        with output:
            output.clear_output()
            
            # Filter data based on widgets
            filtered_df = df.copy()
            
            if domain_dropdown.value != 'All':
                filtered_df = filtered_df[filtered_df['domain'] == domain_dropdown.value]
            
            if pd.notna(year_slider.value[0]) and pd.notna(year_slider.value[1]):
                filtered_df = filtered_df[
                    (filtered_df['year'] >= year_slider.value[0]) & 
                    (filtered_df['year'] <= year_slider.value[1])
                ]
            
            if search_box.value:
                search_term = search_box.value.lower()
                filtered_df = filtered_df[
                    filtered_df['title'].str.lower().str.contains(search_term, na=False) |
                    filtered_df['author'].str.lower().str.contains(search_term, na=False)
                ]
            
            # Display results
            print(f"📚 Found {len(filtered_df)} papers matching your criteria:\n")
            
            for _, paper in filtered_df.iterrows():
                print(f"🔸 **{paper['title']}**")
                print(f"   👤 {paper['author']} ({paper['year']})")
                print(f"   📖 {paper['venue']}")
                print(f"   🏷️ Domain: {paper['domain']}")
                if paper['url']:
                    print(f"   🔗 [View Paper]({paper['url']})")
                print()
    
    # Set up widget interactions
    domain_dropdown.observe(update_display, names='value')
    year_slider.observe(update_display, names='value')
    search_box.observe(update_display, names='value')
    
    # Initial display
    update_display()
    
    return widgets.VBox([
        widgets.HTML('<h3>🔍 Interactive Paper Explorer</h3>'),
        widgets.HBox([domain_dropdown, search_box]),
        year_slider,
        output
    ])

print("🔍 Creating interactive paper explorer...")
explorer = create_paper_explorer(df_papers)

if WIDGETS_AVAILABLE:
    display(explorer)
else:
    display(explorer)

🔍 Creating interactive paper explorer...


VBox(children=(HTML(value='<h3>🔍 Interactive Paper Explorer</h3>'), HBox(children=(Dropdown(description='Domai…

## 🎯 Theoretical Framework Development

### From Phenomenology to Digital Methods

The interactive visualizations above demonstrate my theoretical journey from classical phenomenological approaches to contemporary digital methodologies:

1. **🏛️ Foundational Spatial Theory** (Critical Issues in Geography)
   - Understanding core geographical concepts and disciplinary challenges
   - Engaging with questions of space, scale, and methodology in geographic research
   - Critical perspectives on the discipline's development and future directions

2. **🧠 Psychology-Geography Integration** (Geography & Personality Research)  
   - Bridging spatial theory with psychological frameworks
   - Investigating person-environment relationships across scales
   - Understanding how place shapes personality and behavior patterns

3. **🤖 Contemporary Applications** (GeoAI & Spatial Intelligence)
   - Connecting theory to machine learning and artificial intelligence methods
   - Applying computational approaches to spatial analysis
   - Developing new methodologies for large-scale geographic research

### 📈 Research Evolution Phases

**Phase 1: Theoretical Grounding** 🌱
- Establishing understanding of core geographic concepts
- Exploring critical perspectives on spatial analysis and methodology
- Understanding disciplinary foundations and ongoing debates

**Phase 2: Interdisciplinary Integration** 🔗
- Connecting spatial theory to environmental psychology
- Investigating regional variation in personality and behavior
- Exploring how built environments influence human outcomes

**Phase 3: Methodological Innovation** ⚡
- Applying machine learning to spatial behavior patterns
- Using GeoAI for large-scale environmental psychology research
- Developing new approaches to built-environment-personality studies

### 🎓 Mini-Geo Journey Reflection

This interactive exploration accomplishes several academic goals:

- **📚 Theoretical Engagement**: Demonstrates systematic reading across multiple geographic traditions
- **🕸️ Intellectual Connections**: Shows how different theoretical and methodological approaches inform my research
- **🔬 Methodological Bridge**: Connects classical geographic theory to contemporary computational methods  
- **🗺️ Research Roadmap**: Illustrates clear progression toward my built-environment-personality study

The network visualizations reveal how my reading spans traditional geographic theory, contemporary psychology-geography interfaces, and cutting-edge computational methods - creating a foundation for innovative research at the intersection of these domains.