# Alpaca PySpark Connector Demonstration

This notebook demonstrates how to use the Alpaca Historical Bars PySpark Connector to fetch and analyze stock market data using distributed computation.

## Prerequisites

1. **Alpaca API Credentials**: You need an Alpaca account and API credentials
   - Sign up at [Alpaca](https://alpaca.markets/)
   - Generate API keys from your dashboard
   - Set environment variables `ALPACA_API_KEY` and `ALPACA_SECRET_KEY`

2. **Required Libraries**: This notebook requires PySpark, requests, and visualization libraries

In [None]:
# Install required packages (uncomment if running in a fresh environment)
# !pip install pyspark requests matplotlib seaborn pandas

import os
import warnings
from datetime import datetime, timedelta

# Suppress Spark warnings for cleaner output
warnings.filterwarnings('ignore')
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql_2.12:3.5.0 pyspark-shell'

## Setup Spark Session and Connector

First, we'll initialize a Spark session and create our Alpaca connector.

In [None]:
from pyspark.sql import SparkSession
from alpaca_pyspark.alpaca_connector import create_connector

# Create Spark session with optimized configuration
spark = SparkSession.builder \
    .appName("AlpacaHistoricalBarsDemo") \
    .config("spark.sql.adaptive.enabled", "true") \
    .config("spark.sql.adaptive.coalescePartitions.enabled", "true") \
    .getOrCreate()

# Set log level to reduce verbose output
spark.sparkContext.setLogLevel("WARN")

print(f"Spark version: {spark.version}")
print(f"Number of cores available: {spark.sparkContext.defaultParallelism}")

In [None]:
# Create the Alpaca connector
# Note: Make sure ALPACA_API_KEY and ALPACA_SECRET_KEY environment variables are set
# For demonstration purposes, you can also pass them directly:
# connector = create_connector(spark, api_key="your_key", api_secret="your_secret")

connector = create_connector(
    spark,
    # Optional: customize configuration
    page_size=10000,  # Maximum records per API call
    date_split_days=30,  # Split date ranges into chunks of this many days
    max_retries=3,  # Number of retry attempts for failed requests
    timeout=30  # Request timeout in seconds
)

# Test connection
if connector.validate_connection():
    print("✅ Successfully connected to Alpaca API")
else:
    print("❌ Failed to connect to Alpaca API")
    print("Please check your API credentials and network connection")

## Summary and Next Steps

This notebook demonstrated the key features of the Alpaca PySpark Connector:

### 📚 Resources:
- [Alpaca API Documentation](https://docs.alpaca.markets/)
- [PySpark SQL Guide](https://spark.apache.org/docs/latest/sql-programming-guide.html)
- [Connector Source Code](../alpaca_pyspark/alpaca_connector.py)
- [Test Suite](../tests/test_alpaca_connector.py)

In [None]:
# Clean up Spark session
print("🧹 Cleaning up Spark session...")
spark.stop()
print("✅ Spark session stopped successfully")