# Alpaca PySpark Connector Demonstration

This notebook demonstrates how to use the Alpaca Historical Bars PySpark Connector to fetch and analyze stock market data using distributed computation.

## Prerequisites

1. **Alpaca API Credentials**: You need an Alpaca account and API credentials
   - Sign up at [Alpaca](https://alpaca.markets/)
   - Generate API keys from your dashboard
   - Set environment variables `ALPACA_API_KEY` and `ALPACA_SECRET_KEY`

2. **Required Libraries**: This notebook requires PySpark, requests, and visualization libraries

In [4]:
# Install required packages (uncomment if running in a fresh environment)
# !pip install pyspark requests matplotlib seaborn pandas

import os
import warnings
from dotenv import load_dotenv
from datetime import datetime, timedelta

# Suppress Spark warnings for cleaner output
load_dotenv()  # Load environment variables from .env file
warnings.filterwarnings('ignore')
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql_2.12:3.5.0 pyspark-shell'

## Setup Spark Session and Connector

First, we'll initialize a Spark session and create our Alpaca connector.

In [1]:
from pyspark.sql import SparkSession
from alpaca_pyspark.alpaca_connector import create_connector

# Create Spark session with optimized configuration
spark = SparkSession.builder \
    .appName("AlpacaHistoricalBarsDemo") \
    .config("spark.sql.adaptive.enabled", "true") \
    .config("spark.sql.adaptive.coalescePartitions.enabled", "true") \
    .getOrCreate()

# Set log level to reduce verbose output
spark.sparkContext.setLogLevel("WARN")

print(f"Spark version: {spark.version}")
print(f"Number of cores available: {spark.sparkContext.defaultParallelism}")

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/08/11 22:09:45 WARN Utils: Your hostname, Tristans-MacBook-Pro-222.local, resolves to a loopback address: 127.0.0.1; using 192.168.68.59 instead (on interface en0)
25/08/11 22:09:45 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/08/11 22:09:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Spark version: 4.0.0
Number of cores available: 10


In [6]:
# Create the Alpaca connector
# Note: Make sure ALPACA_API_KEY and ALPACA_SECRET_KEY environment variables are set
# For demonstration purposes, you can also pass them directly:
# connector = create_connector(spark, api_key="your_key", api_secret="your_secret")

connector = create_connector(
    spark,
    # Optional: customize configuration
    page_size=10000,  # Maximum records per API call
    date_split_days=30,  # Split date ranges into chunks of this many days
    max_retries=3,  # Number of retry attempts for failed requests
    timeout=30  # Request timeout in seconds
)

# Test connection
if connector.validate_connection():
    print("✅ Successfully connected to Alpaca API")
else:
    print("❌ Failed to connect to Alpaca API")
    print("Please check your API credentials and network connection")

INFO:alpaca_pyspark.alpaca_connector:Connection to Alpaca API successful


✅ Successfully connected to Alpaca API


## Summary and Next Steps

This notebook demonstrated the key features of the Alpaca PySpark Connector:

### 📚 Resources:
- [Alpaca API Documentation](https://docs.alpaca.markets/)
- [PySpark SQL Guide](https://spark.apache.org/docs/latest/sql-programming-guide.html)
- [Connector Source Code](../alpaca_pyspark/alpaca_connector.py)
- [Test Suite](../tests/test_alpaca_connector.py)

In [7]:
# Clean up Spark session
print("🧹 Cleaning up Spark session...")
spark.stop()
print("✅ Spark session stopped successfully")

🧹 Cleaning up Spark session...
✅ Spark session stopped successfully
