# arXiv Paper Curator - Week 1: Infrastructure Setup

Build a production-grade RAG system using Docker, PostgreSQL, OpenSearch, FastAPI, Airflow, and Ollama.

## Technology Stack
| Component | Purpose | Port |
|-----------|---------|------|
| **FastAPI** | REST API | 8000 |
| **PostgreSQL** | Paper metadata storage | 5432 |
| **OpenSearch** | Hybrid search engine | 9200/5601 |
| **Apache Airflow** | Workflow automation | 8080 |
| **Ollama** | Local LLM inference | 11434 |

## Learning Materials

**Core Technologies:**
- **Docker**: [Tutorial Video](https://www.youtube.com/watch?v=pg19Z8LL06w) | [Docker Compose](https://www.youtube.com/watch?v=SXwC9fSwct8)
- **FastAPI**: [YouTube Series](https://www.youtube.com/playlist?list=PLK8U0kF0E_D6l19LhOGWhVZ3sQ6ujJKq_) | [Documentation](https://fastapi.tiangolo.com/tutorial/)
- **PostgreSQL**: [Beginners Guide](https://www.youtube.com/watch?v=SpfIwlAYaKk) | [FastAPI + PostgreSQL](https://www.youtube.com/watch?v=398DuQbQJq0)
- **OpenSearch**: [Getting Started](https://docs.opensearch.org/latest/getting-started/)
- **Apache Airflow**: [Tutorial Video](https://www.youtube.com/watch?v=Y_vQyMljDsE)

**Development Tools:**
- **VS Code Setup**: [Video Guide](https://www.youtube.com/watch?v=mpk4Q5feWaw)
- **Git Basics**: [Tutorial](https://www.youtube.com/watch?v=zTjRZNkhiEU)
- **UV Package Manager**: [Setup Video](https://www.youtube.com/watch?v=AMdG7IjgSPM)

## Prerequisites

**Required Software:**
- Python 3.12+ ([Download](https://www.python.org/downloads/))
- UV Package Manager ([Install Guide](https://docs.astral.sh/uv/getting-started/installation/))
- Docker Desktop ([Download](https://docs.docker.com/get-docker/))
- Git ([Download](https://git-scm.com/downloads))

**System Requirements:**
- 8GB+ RAM (16GB recommended)
- 20GB+ free disk space

## Setup Instructions

**Before running cells:**
1. Extract/clone project to your system
2. Open terminal in project root (contains `compose.yml`)
3. Run: `uv sync`
4. Start Jupyter: `uv run jupyter notebook`
5. Verify kernel shows project environment (.venv)

In [1]:
# Environment Check
import sys
from pathlib import Path

python_version = sys.version_info
print(f"Python Version: {python_version.major}.{python_version.minor}.{python_version.micro}")
print(f"Environment: {sys.executable}")

if python_version >= (3, 12):
    print("✓ Python version compatible")
else:
    print("✗ Need Python 3.12+")
    exit()

Python Version: 3.12.11
Environment: /Users/yuvatejachunduru/Desktop/Projects/.venv/bin/python
✓ Python version compatible


In [2]:
# Find Project Root
current_dir = Path.cwd()

if current_dir.name == "week1" and current_dir.parent.name == "notebooks":
    project_root = current_dir.parent.parent
elif (current_dir / "compose.yml").exists():
    project_root = current_dir
else:
    project_root = None

if project_root and (project_root / "compose.yml").exists():
    print(f"✓ Project root: {project_root}")
else:
    print("✗ Missing compose.yml - check directory")
    exit()

✓ Project root: /Users/yuvatejachunduru/Desktop/Projects/arxiv-paper-curator-week1.0


In [3]:
# Check Docker
import subprocess

try:
    result = subprocess.run(["docker", "--version"], capture_output=True, text=True, timeout=5)
    if result.returncode == 0:
        print(f"✓ Docker: {result.stdout}")
    else:
        print("✗ Docker: Not working")
        exit()
except:
    print("✗ Docker: Not found")
    exit()

✓ Docker: Docker version 28.3.2, build 578ccf6



In [4]:
# Check Docker Compose
try:
    result = subprocess.run(["docker", "compose", "version"], capture_output=True, text=True, timeout=5)
    if result.returncode == 0:
        print(f"✓ Docker Compose: {result.stdout.split()[3]}")
    else:
        print("✗ Docker Compose: Not working")
        exit()
except:
    print("✗ Docker Compose: Not found")
    exit()

✓ Docker Compose: v2.38.2-desktop.1


In [5]:
# Check UV Package Manager
try:
    result = subprocess.run(["uv", "--version"], capture_output=True, text=True, timeout=5)
    if result.returncode == 0:
        print(f"✓ UV: {result.stdout.strip()}")
        print("\n✓ All required software ready!")
    else:
        print("✗ UV: Not working")
        exit()
except:
    print("✗ UV: Not found")
    exit()

✓ UV: uv 0.8.8 (Homebrew 2025-08-09)

✓ All required software ready!


## Start Services

**Command to run (in terminal):**
```bash
cd [project-root]
docker compose up -d
```

**What this does:** Downloads images (first time) and starts all services in background.

In [6]:
# Check Docker Running
try:
    result = subprocess.run(["docker", "info"], capture_output=True, timeout=5)
    if result.returncode == 0:
        print("✓ Docker is running")
    else:
        print("✗ Docker not running - start Docker Desktop")
        exit()
except:
    print("✗ Docker daemon not accessible")
    exit()

✓ Docker is running


In [7]:
# Check Current Containers
import json

try:
    result = subprocess.run(
        ["docker", "compose", "ps", "--format", "json"],
        cwd=str(project_root),
        capture_output=True,
        text=True,
        timeout=10
    )
    
    if result.returncode == 0 and result.stdout.strip():
        print("Current containers:")
        for line in result.stdout.strip().split('\n'):
            if line.strip():
                try:
                    container = json.loads(line)
                    service = container.get('Service', 'unknown')
                    state = container.get('State', 'unknown')
                    print(f"  • {service}: {state}")
                except:
                    pass
    else:
        print("No containers running")
        
except Exception as e:
    print("Could not check containers")

Current containers:
  • airflow: running
  • api: running
  • opensearch-dashboards: running
  • ollama: running
  • opensearch: running
  • postgres: running


## Service Health Verification

All services start automatically. Check their health status:

In [8]:
# Service Health Check
EXPECTED_SERVICES = {
    'api': 'FastAPI REST API server',
    'postgres': 'PostgreSQL database',
    'opensearch': 'OpenSearch search engine', 
    'opensearch-dashboards': 'OpenSearch web dashboard',
    'ollama': 'Local LLM inference server',
    'airflow': 'Workflow automation (optional - may be off)'
}

try:
    result = subprocess.run(
        ["docker", "compose", "ps", "--format", "json"],
        cwd=str(project_root),
        capture_output=True,
        text=True,
        timeout=15
    )
    
    if result.returncode == 0:
        print("SERVICE STATUS")
        print("=" * 70)
        print(f"{'Service':<20} {'State':<15} {'Status':<15} {'Notes'}")
        print("-" * 70)
    else:
        print("Could not get service status")
        exit()
        
except Exception as e:
    print(f"Error checking services: {e}")
    exit()

# Parse Service Status
found_services = set()
service_states = {}

if result.stdout.strip():
    for line in result.stdout.strip().split('\n'):
        if line.strip():
            try:
                container = json.loads(line)
                service = container.get('Service', 'unknown')
                state = container.get('State', 'unknown')
                health = container.get('Health', 'no check')
                
                found_services.add(service)
                service_states[service] = {'state': state, 'health': health}
                
                if state == 'running' and health in ['healthy', 'no check']:
                    indicator = "✓"
                    notes = "Ready"
                elif state == 'running' and health == 'unhealthy':
                    indicator = "⚠"
                    notes = "Starting up..."
                elif state == 'exited':
                    indicator = "✗"
                    notes = "Failed to start"
                else:
                    indicator = "?"
                    notes = f"Status: {state}"
                
                print(f"{indicator} {service:<18} {state:<14} {health:<14} {notes}")
                
            except json.JSONDecodeError:
                pass

SERVICE STATUS
Service              State           Status          Notes
----------------------------------------------------------------------
✓ airflow            running        healthy        Ready
✓ api                running        healthy        Ready
✓ opensearch-dashboards running        healthy        Ready
✓ ollama             running        healthy        Ready
✓ opensearch         running        healthy        Ready
✓ postgres           running        healthy        Ready


In [9]:
# Check Missing Services
missing_services = set(EXPECTED_SERVICES.keys()) - found_services

if missing_services:
    print("\nMISSING SERVICES:")
    print("-" * 70)
    for service in missing_services:
        description = EXPECTED_SERVICES[service]
        if service == 'airflow':
            print(f"⚠ {service:<18} not running    {'(Optional)':<14} {description}")
        else:
            print(f"✗ {service:<18} not running    {'Required':<14} {description}")

failed_services = [s for s, info in service_states.items() 
                  if info['state'] in ['exited', 'restarting'] or info['health'] == 'unhealthy']

if failed_services:
    print(f"\nTROUBLESHOOTING:")
    for service in failed_services:
        print(f"   docker compose logs {service}")
elif missing_services and 'airflow' not in missing_services:
    print(f"\nACTION NEEDED:")
    print("Start missing services: docker compose up -d")

### 1. FastAPI - REST API Service

**Interactive Exploration:**

You can explore and test the FastAPI service in several ways:
- **API Documentation**: http://localhost:8000/docs (Interactive Swagger UI)
- **Alternative Docs**: http://localhost:8000/redoc (ReDoc interface)
- **Source Code**: Located in `src/routers/` directory

Let's test the API endpoints and explore the documentation:

In [10]:
# Test FastAPI Health
import requests

try:
    response = requests.get("http://localhost:8000/health", timeout=5)
    if response.status_code == 200:
        data = response.json()
        print("✓ FastAPI is responding")
        print(f"Status: {data.get('status', 'unknown')}")
    else:
        print(f"⚠ API returned status: {response.status_code}")
except requests.exceptions.ConnectionError:
    print("✗ API not responding - wait 1-2 minutes")
except Exception as e:
    print(f"✗ API test error: {e}")

✓ FastAPI is responding
Status: ok


In [11]:
# PRODUCTION INSIGHTS
print("\n" + "="*60)
print("  PRODUCTION INSIGHT (Online Sessions Only)")
print("="*60)
print("❓ How are they scaled?")
print("❓ What are the bottlenecks?")
print("❓ How are they monitored and managed?")
print("❓ How are they integrated with other systems?")
print("❓ What are the best practices for using these systems?")
print("❓ How are these systems used and deployed in production?")
print("❓ How are they tested? in terms of load and performance?")
print("→ Learn these production secrets in our online walkthrough sessions!")
print("="*60)


  PRODUCTION INSIGHT (Online Sessions Only)
❓ How are they scaled?
❓ What are the bottlenecks?
❓ How are they monitored and managed?
❓ How are they integrated with other systems?
❓ What are the best practices for using these systems?
❓ How are these systems used and deployed in production?
❓ How are they tested? in terms of load and performance?
→ Learn these production secrets in our online walkthrough sessions!


### 2. Apache Airflow - Workflow Automation

**Interactive Exploration:**

Apache Airflow manages data pipelines and automated workflows. You can explore it through:
- **Web Dashboard**: http://localhost:8080 
- **Login**: Username: `admin`, Password: Found in container (see test below)
- **Source Code**: Located in `airflow/dags/` directory

**Simple Password Location:**
Airflow 3.0 stores the admin password in a predictable file:
```
/opt/airflow/simple_auth_manager_passwords.json.generated
```

The test below automatically reads this file for you!

Let's test Airflow and get the password:

In [12]:
# Get Airflow Password
import json
from pathlib import Path

password_file = project_root / "airflow" / "simple_auth_manager_passwords.json.generated"

try:
    if password_file.exists():
        with open(password_file, 'r') as f:
            data = json.load(f)
            password = data.get("admin")
        print(f"✓ Airflow password: {password}")
    else:
        print(f"⚠ Password file not found")
        password = None
except Exception as e:
    print(f"✗ Could not read password: {e}")
    password = None

✓ Airflow password: 2gv664ArnytBFsm8


In [13]:
# Test Airflow Health
try:
    response = requests.get("http://localhost:8080/api/v2/monitor/health", timeout=5)
    if response.status_code == 200:
        print("✓ Airflow is healthy")
        
        if password:
            print(f"\nAirflow Login:")
            print(f"URL: http://localhost:8080")
            print(f"Username: admin")
            print(f"Password: {password}")
    else:
        print(f"⚠ Airflow returned: {response.status_code}")
        
except requests.exceptions.ConnectionError:
    print("✗ Airflow not responding - wait 2-3 minutes")
except Exception as e:
    print(f"✗ Airflow test error: {e}")

✓ Airflow is healthy

Airflow Login:
URL: http://localhost:8080
Username: admin
Password: 2gv664ArnytBFsm8


### 3. OpenSearch - Hybrid database

**Interactive Exploration:**

OpenSearch provides full-text search and analytics capabilities:
- **API Endpoint**: http://localhost:9200 
- **Dashboards UI**: http://localhost:5601 (Web interface)
- **Source Code**: Located in `src/services/opensearch/` directory

**Important for Students:** 
- ✅ Use http://localhost:5601 for web interface
- ✅ Use Dev Tools in Dashboards for API queries

Let's test OpenSearch and explore its capabilities:

In [14]:
# Test 1: Check OpenSearch Dashboards Web Interface
# This is the proper way for students to interact with OpenSearch

dashboards_url = "http://localhost:5601"

try:
    # Test if Dashboards is accessible
    response = requests.get(f"{dashboards_url}/api/status", timeout=10, allow_redirects=True)
    if response.status_code == 200:
        print("✓ OpenSearch Dashboards is accessible!")
        print("✓ Web interface is ready for exploration")
        
        print("\n Web Interface Access:")
        print("=" * 40)
        print(f"Main Dashboard: {dashboards_url}")
        print(f"Dev Tools: {dashboards_url}/app/dev_tools")
        print("=" * 40)
        
        print("\n Student Learning Activities:")
        print("1. Explore the Dashboard:")
        print("   • Visit http://localhost:5601")
        print("   • Navigate through the interface")
        print("   • Check out the 'Discover' tab")
        
        print("\n2. Use Dev Tools for API Queries:")
        print("   • Go to Dev Tools")
        print("   • Try: GET /_cluster/health")
        print("   • Try: GET /_cat/indices?v")
        print("   • Try: GET /_cluster/stats")
        print("   • Check the learning material for more information")
        
    else:
        print(f"⚠ Dashboards returned status: {response.status_code}")
        print("Interface may still be starting up")
        
except requests.exceptions.ConnectionError:
    print("✗ OpenSearch Dashboards not accessible yet")
    print("Wait 2-3 minutes for full startup")
    
except requests.exceptions.Timeout:
    print("⚠ Dashboards request timed out")
    print("This is normal during startup - try again in a few minutes")
    
except Exception as e:
    print(f"✗ Error accessing Dashboards: {e}")
    print("Check container status: docker compose ps")

✓ OpenSearch Dashboards is accessible!
✓ Web interface is ready for exploration

 Web Interface Access:
Main Dashboard: http://localhost:5601
Dev Tools: http://localhost:5601/app/dev_tools

 Student Learning Activities:
1. Explore the Dashboard:
   • Visit http://localhost:5601
   • Navigate through the interface
   • Check out the 'Discover' tab

2. Use Dev Tools for API Queries:
   • Go to Dev Tools
   • Try: GET /_cluster/health
   • Try: GET /_cat/indices?v
   • Try: GET /_cluster/stats
   • Check the learning material for more information


In [15]:
# PRODUCTION DEPLOYMENT INSIGHT
print("\n" + "="*60)
print("🎯 PRODUCTION INSIGHT (Online Sessions Only)")
print("="*60)
print("❓ Why companies use OpenSearch?")
print("❓ What all is achievable with OpenSearch?")
print("❓ How does OpenSearch handle billions of documents?")
print("❓ How do companies search through billions of documents?")
print("❓ How do e-commerce giants search millions of products instantly?")
print("→ Learn these production secrets in our online walkthrough sessions!")
print("="*60)


🎯 PRODUCTION INSIGHT (Online Sessions Only)
❓ Why companies use OpenSearch?
❓ What all is achievable with OpenSearch?
❓ How does OpenSearch handle billions of documents?
❓ How do companies search through billions of documents?
❓ How do e-commerce giants search millions of products instantly?
→ Learn these production secrets in our online walkthrough sessions!


### 4. Ollama - Local LLM Inference Engine

**Interactive Exploration:**

Ollama runs large language models locally on your machine:
- **API Endpoint**: http://localhost:11434
- **Command Line**: Available inside the container
- **Privacy**: All AI processing happens locally (no external APIs)

Let's test Ollama and see what models are available:

In [16]:
# Test 1: Check Ollama Service Status
# Let's see if Ollama is running and what models are available

import requests
import json

ollama_url = "http://localhost:11434/api/tags"

try:
    response = requests.get(ollama_url, timeout=5)
    if response.status_code == 200:
        models_data = response.json()
        models = models_data.get('models', [])
        
        print("✓ Ollama is running!")
        print(f"Available models: {len(models)}")
        
        if models:
            print("\nInstalled Models:")
            for model in models:
                name = model.get('name', 'unknown')
                size = model.get('size', 0)
                size_gb = round(size / (1024**3), 1)
                print(f"  • {name} ({size_gb} GB)")
        else:
            print("\n  No models installed yet")
            print("   This is normal - models are large files (3-7 GB each)")
            print("   In Week 4, we'll install a model like llama3.2")
            
        print("\n  Try This Later (Week 4):")
        print("1. docker exec -it rag-ollama ollama pull llama3.2")
        print("2. docker exec -it rag-ollama ollama list")
        print("3. docker exec -it rag-ollama ollama run llama3.2")
        
    else:
        print(f"⚠ Ollama returned status: {response.status_code}")
        
except requests.exceptions.ConnectionError:
    print("✗ Ollama is not responding yet")
    print("Ollama service might still be starting")
    
except requests.exceptions.Timeout:
    print("✗ Ollama request timed out")
    print("Service might still be initializing")
    
except Exception as e:
    print(f"✗ Unexpected error testing Ollama: {e}")
    print("Try again in a few minutes")

✓ Ollama is running!
Available models: 1

Installed Models:
  • llama3.2:1b (1.2 GB)

  Try This Later (Week 4):
1. docker exec -it rag-ollama ollama pull llama3.2
2. docker exec -it rag-ollama ollama list
3. docker exec -it rag-ollama ollama run llama3.2


In [17]:
# Test 2: Check Ollama Version and Health
# Let's verify Ollama is properly configured

import requests
import json

ollama_version_url = "http://localhost:11434/api/version"

try:
    response = requests.get(ollama_version_url, timeout=5)
    if response.status_code == 200:
        version_data = response.json()
        version = version_data.get('version', 'unknown')
        
        print("✓ Ollama API is healthy!")
        print(f"Version: {version}")
        
        print("\n  What is Ollama?")
        print("• Runs AI models completely on your local machine")
        print("• No data sent to external services (privacy-first)")
        print("• No API fees or rate limits")
        print("• Supports models like Llama, Mistral, Phi, etc.")
        
        print("\n  Coming in Week 4:")
        print("• Install and run a local language model")
        print("• Generate answers to research questions")
        print("• Summarize academic papers")
        print("• All processing stays on your computer!")
        
    else:
        print(f"⚠ Ollama version check returned: {response.status_code}")
        
except requests.exceptions.ConnectionError:
    print("✗ Could not check Ollama version")
    print("Service might still be starting up")
    
except requests.exceptions.Timeout:
    print("✗ Ollama request timed out")
    print("Service might still be initializing")
    
except Exception as e:
    print(f"✗ Unexpected error checking version: {e}")
    print("Try again in a few minutes")

✓ Ollama API is healthy!
Version: 0.11.2

  What is Ollama?
• Runs AI models completely on your local machine
• No data sent to external services (privacy-first)
• No API fees or rate limits
• Supports models like Llama, Mistral, Phi, etc.

  Coming in Week 4:
• Install and run a local language model
• Generate answers to research questions
• Summarize academic papers
• All processing stays on your computer!


In [18]:
# PRODUCTION DEPLOYMENT INSIGHT
print("\n" + "="*60)
print("🎯 PRODUCTION INSIGHT (Online Sessions Only)")
print("="*60)
print("❓ What are the real issues with LLMs when in production?")
print("❓ What is the difference between fine-tuned LLM and RAG?")
print("❓ How do companies serve LLMs without burning through cash?")
print("→ Learn these production secrets in our online walkthrough sessions!")
print("="*60)


🎯 PRODUCTION INSIGHT (Online Sessions Only)
❓ What are the real issues with LLMs when in production?
❓ What is the difference between fine-tuned LLM and RAG?
❓ How do companies serve LLMs without burning through cash?
→ Learn these production secrets in our online walkthrough sessions!


In [19]:
# HANDS-ON: Pull and Test Llama 3.2 (Small Model)

import requests
import subprocess
import time

print("DOWNLOADING LLAMA 3.2:1B MODEL")
print("=" * 50)
print("This is a small 1.3GB model - perfect for testing!")
print("Download will take 2-5 minutes depending on your internet speed...")

try:
    result = subprocess.run(
        ["docker", "exec", "rag-ollama", "ollama", "pull", "llama3.2:1b"],
        cwd=str(project_root),
        capture_output=True,
        text=True,
        timeout=600
    )
    
    if result.returncode == 0:
        print("Llama 3.2:1b model downloaded successfully!")
    else:
        print(f"Download issue: {result.stderr}")
        
except subprocess.TimeoutExpired:
    print("Download timed out - this is normal for slow connections")
    print("The download continues in the background")
except Exception as e:
    print(f"Error downloading model: {e}")
    print("Make sure Ollama container is running: docker compose ps")

DOWNLOADING LLAMA 3.2:1B MODEL
This is a small 1.3GB model - perfect for testing!
Download will take 2-5 minutes depending on your internet speed...
Llama 3.2:1b model downloaded successfully!


In [20]:
# Test Llama 3.2:1b API

def test_ollama_model(model_name, prompt, max_wait_time=60):
    """Test an Ollama model with a prompt."""
    print(f"Testing {model_name} with prompt: '{prompt}'")
    print("-" * 60)
    
    url = "http://localhost:11434/api/generate"
    data = {
        "model": model_name,
        "prompt": prompt,
        "stream": False
    }
    
    try:
        print("Generating response (this may take 10-30 seconds)...")
        start_time = time.time()
        
        response = requests.post(url, json=data, timeout=max_wait_time)
        
        if response.status_code == 200:
            result = response.json()
            response_text = result.get('response', '').strip()
            
            elapsed_time = time.time() - start_time
            print(f"Response generated in {elapsed_time:.1f} seconds")
            print("\nRESPONSE:")
            print("=" * 40)
            print(response_text)
            print("=" * 40)
            
            if 'model' in result:
                print(f"\nModel: {result['model']}")
            if 'total_duration' in result:
                duration_ms = result['total_duration'] / 1000000
                print(f"Generation time: {duration_ms:.0f}ms")
                
            return True
            
        else:
            print(f"API error: {response.status_code}")
            print(f"Response: {response.text}")
            return False
            
    except requests.exceptions.ConnectionError:
        print("Could not connect to Ollama API")
        print("Make sure Ollama is running: docker compose ps")
        return False
    except requests.exceptions.Timeout:
        print("Request timed out")
        print("Model might be loading for the first time (this is normal)")
        return False
    except Exception as e:
        print(f"Unexpected error: {e}")
        return False

test_prompt = "What is machine learning in one sentence?"
success = test_ollama_model("llama3.2:1b", test_prompt)

if success:
    print("\nSUCCESS! Your local AI model is working!")
    print("\nTry more prompts:")
    print('• test_ollama_model("llama3.2:1b", "Explain neural networks simply")')
    print('• test_ollama_model("llama3.2:1b", "Write a Python function to sort a list")')
else:
    print("\nTroubleshooting:")
    print("1. Make sure model downloaded: docker exec rag-ollama ollama list")
    print("2. Check Ollama logs: docker compose logs ollama")
    print("3. Try again - first run takes longer to load model into memory")

Testing llama3.2:1b with prompt: 'What is machine learning in one sentence?'
------------------------------------------------------------
Generating response (this may take 10-30 seconds)...
Response generated in 7.5 seconds

RESPONSE:
Machine learning is a subfield of artificial intelligence that enables computers to learn from data, make predictions or decisions without being explicitly programmed, by analyzing patterns and relationships in the data.

Model: llama3.2:1b
Generation time: 7436ms

SUCCESS! Your local AI model is working!

Try more prompts:
• test_ollama_model("llama3.2:1b", "Explain neural networks simply")
• test_ollama_model("llama3.2:1b", "Write a Python function to sort a list")


### 5. PostgreSQL - Database Storage

**Interactive Exploration:**

PostgreSQL stores all structured data for our application:
- **Connection**: localhost:5432
- **Database**: rag_db
- **Username/Password**: rag_user / rag_password
- **GUI Tool Recommendation**: DBeaver (free database client)

Let's test the database connection and explore the schema:

In [21]:
# Test 1: Check PostgreSQL Connection (Basic)
# Let's verify PostgreSQL is accepting connections

def test_postgres_connection():
    """Test PostgreSQL connection using simple socket check."""
    import socket
    
    try:
        # Test if PostgreSQL port is open
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(3)
        result = sock.connect_ex(('localhost', 5432))
        sock.close()
        
        if result == 0:
            print("✓ PostgreSQL is accepting connections on port 5432!")
            return True
        else:
            print("✗ PostgreSQL port is not accessible")
            return False
            
    except Exception as e:
        print(f"✗ Could not test PostgreSQL: {e}")
        return False

postgres_available = test_postgres_connection()

if postgres_available:
    print("\n  Database Connection Details:")
    print("• Host: localhost")
    print("• Port: 5432") 
    print("• Database: rag_db")
    print("• Username: rag_user")
    print("• Password: rag_password")
    
    print("\n  Recommended GUI Tools:")
    print("• DBeaver (Free): https://dbeaver.io/download/")
    print("• pgAdmin: https://www.pgadmin.org/download/")

✓ PostgreSQL is accepting connections on port 5432!

  Database Connection Details:
• Host: localhost
• Port: 5432
• Database: rag_db
• Username: rag_user
• Password: rag_password

  Recommended GUI Tools:
• DBeaver (Free): https://dbeaver.io/download/
• pgAdmin: https://www.pgadmin.org/download/


In [22]:
# Test PostgreSQL Connection
try:
    import psycopg2
    
    conn = psycopg2.connect(
        host="localhost",
        port=5432,
        database="rag_db", 
        user="rag_user",
        password="rag_password"
    )
    
    print("✓ PostgreSQL connected")
    cursor = conn.cursor()
    
except ImportError:
    print("⚠ psycopg2 not installed - basic connection only")
    exit()
except Exception as e:
    print(f"✗ Database connection failed: {e}")
    exit()

⚠ psycopg2 not installed - basic connection only


In [1]:
# Check Database Tables
cursor.execute("""
    SELECT table_name 
    FROM information_schema.tables 
    WHERE table_schema = 'public'
    ORDER BY table_name;
""")

all_tables = cursor.fetchall()

app_tables = []
airflow_tables = []

for (table_name,) in all_tables:
    if table_name in ['papers', 'users', 'embeddings']:
        app_tables.append(table_name)
    else:
        airflow_tables.append(table_name)

print(f"Found {len(all_tables)} total tables")
print(f"Application tables: {len(app_tables)}")
print(f"Airflow tables: {len(airflow_tables)}")

for table in app_tables:
    print(f"  • {table}")

if not app_tables:
    print("  No application tables yet (expected in Week 1)")
    
cursor.close()
conn.close()

NameError: name 'cursor' is not defined

In [2]:
# PRODUCTION DEPLOYMENT INSIGHT
print("\n" + "="*60)
print("🎯 PRODUCTION INSIGHT (Online Sessions Only)")
print("="*60)
print("❓ How do companies handle millions of transactions with PostgreSQL?")
print("❓ What's the secret to zero-downtime database migrations?")
print("→ Learn these production secrets in our online walkthrough sessions!")
print("="*60)


🎯 PRODUCTION INSIGHT (Online Sessions Only)
❓ How do companies handle millions of transactions with PostgreSQL?
❓ What's the secret to zero-downtime database migrations?
→ Learn these production secrets in our online walkthrough sessions!


### Service Health Summary and Next Steps

Based on the interactive tests above:

**If all services show ✓**: 
- 🎉 Congratulations! Your infrastructure is ready
- All services are healthy and responding correctly
- You can explore each service using the links and instructions provided

**If some services show ✗**:
- Don't worry! Services take time to start
- Wait 2-3 minutes and re-run the test cells
- OpenSearch and Airflow take the longest (up to 5 minutes)

**Service Access Points:**
- **FastAPI Documentation**: http://localhost:8000/docs - Interactive API testing
- **Airflow Dashboard**: http://localhost:8080 (admin/admin) - Workflow management
- **OpenSearch Dashboards**: http://localhost:5601 - Dashboard and user interface + analytics
- **OpenSearch API**: http://localhost:9200 - Direct API access
- **Ollama API**: http://localhost:11434 - Local LLM inference
- **PostgreSQL**: http://localhost:5432 - Use DBeaver or similar tools

**Hands-On Learning Activities:**

1. **FastAPI**: Test endpoints in the interactive documentation
2. **Airflow**: Login and trigger a DAG manually  
3. **OpenSearch**: Try queries in the Dev Tools
4. **Ollama**: Prepare for Week 6 model installation
5. **PostgreSQL**: Install DBeaver and explore the database structure

**Common Issues:**
- "Connection refused" → Service still starting
- "Port in use" → Another application using the port  
- Container restarting → Check logs with `docker compose logs [service-name]`

## Troubleshooting

**Common Issues:**
- **Connection refused** → Service still starting (wait 2-3 minutes)
- **Port in use** → Stop conflicting application or change ports
- **Container restarting** → Check logs: `docker compose logs [service-name]`
- **Out of memory** → Increase Docker Desktop memory allocation

**Reset everything:** `docker compose down && docker compose up -d`

## Week 1 Complete

**Service Access Points:**
- **API**: http://localhost:8000/docs
- **Airflow**: http://localhost:8080 (admin/sBtDW9ffYBgETMqR)  
- **OpenSearch**: http://localhost:5601
- **PostgreSQL**: localhost:5432 (rag_user/rag_password)

**Success Criteria:**
- [ ] All services healthy in status check
- [ ] API documentation accessible
- [ ] Airflow dashboard loads
- [ ] OpenSearch interface works

**Next:** Keep services running or restart with `docker compose up -d`

## Project Commands

**Makefile shortcuts:**
```bash
make start    # Start all services  
make status   # Check service status
make logs     # View logs
make health   # Check service health
make stop     # Stop all services
make help     # View all commands
```

**Next:** Read the main `README.md` for complete project documentation.