# Serving Layer: The Valuation Engine

## architecture: Hybrid Stream-Batch Join
This pipeline implements a **Lambda Architecture** pattern (Speed + Batch) to derive real-time financial valuations.

1.  **Speed Layer (Stream):** Contains real-time market data (Price, Beta, Volatility) updated frequently.
2.  **Batch Layer (Static):** Contains quarterly financial reports (Debt, Tax Expense, NOPAT) stored in S3.
3.  **Serving Layer (Join):** We merge these two timelines. Since financial reports are sparse (quarterly) and prices are dense (daily/minutely), we use a **Forward-Fill** strategy to attribute the latest available financial report to every real-time price point.

## Key Metrics
*   **WACC (Weighted Average Cost of Capital):** Calculated dynamically using real-time Market Cap weights.
*   **Enterprise Value:** Market Cap + Net Debt.
*   **Implied PE:** Price / (NOPAT/Shares).

In [0]:
import pyspark.sql.functions as F
from pyspark.sql.window import Window

# --- CONFIGURATION ---

# 1. INPUT SOURCES
speed_path = "/Volumes/workspace/default/storage/gold/ticker_data_v10" # Streaming Delta (Volumes)
batch_path = "gold_valid_audit" # Static Delta (Managed Table / S3)

# 2. OUTPUT DESTINATIONS
serving_path = "/Volumes/workspace/default/storage/serving/valuation_dashboard_v5"
checkpoint_path = "/Volumes/workspace/default/storage/checkpoints/job_serving_dashboard_v5"

In [0]:
# --- 1. READ SPEED LAYER (STREAMING) ---
# Read the high-velocity market data from the Volume.
# Note: In a production stream, we would use spark.readStream
df_speed = spark.read.format("delta").load(speed_path).withColumnRenamed("s", "symbol")


In [0]:
# --- 2. READ BATCH LAYER (STATIC S3) ---
# Access the registered managed table pointing to S3 data.
df_batch_raw = (
    spark.read
    .format("delta")
    .table(batch_path)
)
df_batch_raw.show(10)

In [0]:
# --- 3. PREPARE BATCH (FORWARD FILL LOGIC) ---
# We select ONLY the single latest financial report for each symbol.
# This assumes the latest report is valid for all subsequent market data until a new report arrives.
window_spec = Window.partitionBy("symbol").orderBy(F.col("date").desc())

df_batch_latest = df_batch_raw.withColumn("rank", F.row_number().over(window_spec)) \
                              .filter("rank = 1") \
                              .drop("rank") \
                              .withColumnRenamed("date", "report_date")

# --- 4. CALCULATE STATIC FINANCIAL RATIOS ---
# These metrics are constant for the quarter (Debt, Tax Rate).
# We use nullif(..., 0) to prevent DivisionByZero errors during calculation.
df_batch_prepared = df_batch_latest.withColumn(
    "cost_of_debt", 
    F.col("interest_expense") / F.nullif(F.col("total_debt"), F.lit(0))
).withColumn(
    "effective_tax_rate",
    F.col("tax_expense") / F.nullif((F.col("nopat") + F.col("tax_expense")), F.lit(0)) 
    # Formula: Approx Pre-Tax Income = NOPAT + Tax
).select(
    "symbol", "report_date", "shares_outstanding", 
    "net_debt", "total_debt", "cost_of_debt", "effective_tax_rate",
    "calculated_fcf", "nopat" # Retained for Dashboard display
)

# --- 5. STREAM-STATIC JOIN ---
# Strategy: Broadcase Left Join
# - Left side (Speed) is large and distributed.
# - Right side (Batch) is small (1 row per symbol).
# - Result: Even if a symbol has no financial report (Batch), we still show its price (Speed).
df_serving = df_speed.join(F.broadcast(df_batch_prepared), on="symbol", how="left")

In [0]:
# --- 6. VALUATION LOGIC (THE WACC ENGINE) ---

# A. Market Cap = Price * Shares
market_cap = F.col("close_price") * F.col("shares_outstanding")

# B. Enterprise Value (V) = Equity + Debt
# Note: Total Debt is used for WACC weighting; Net Debt is used for EV calculation.
total_capital = market_cap + F.col("total_debt")
enterprise_value = market_cap + F.col("net_debt")

# C. Weights (Equity vs Debt)
weight_equity = market_cap / F.nullif(total_capital, F.lit(0))
weight_debt = F.col("total_debt") / F.nullif(total_capital, F.lit(0))

# D. WACC Formula
# WACC = (We * Ke) + (Wd * Kd * (1 - T))
wacc_calc = (
    (weight_equity * F.col("cost_of_equity")) + 
    (weight_debt * F.col("cost_of_debt") * (1 - F.col("effective_tax_rate")))
)

# E. PE Ratio (Implied)
# EPS Proxy = NOPAT / Shares (Simplified)
eps_proxy = F.col("nopat") / F.nullif(F.col("shares_outstanding"), F.lit(0))
pe_ratio = F.col("close_price") / F.nullif(eps_proxy, F.lit(0))

# --- 7. APPLY TRANSFORMATION ---
df_final = df_serving.withColumn("market_cap", market_cap) \
                     .withColumn("enterprise_value", enterprise_value) \
                     .withColumn("wacc", wacc_calc) \
                     .withColumn("pe_ratio_implied", pe_ratio) \
                     .withColumn("valuation_timestamp", F.current_timestamp()) \
                     .withColumn("valuation_date", F.to_date(F.current_timestamp())) 
                     

In [0]:
# --- 8. SELECT FINAL DASHBOARD COLUMNS ---
output_schema = [
    # Identity
    "symbol", "valuation_timestamp", "report_date",  "valuation_date",
    # Speed Metrics
    "close_price", "beta", "volatility", "momentum", "cost_of_equity",
    # Batch Metrics
    "calculated_fcf", "cost_of_debt", "effective_tax_rate",
    # Synthesis (Valuation)
    "market_cap", "enterprise_value", "wacc", "pe_ratio_implied"
]

df_dashboard = df_final.select(*output_schema)

# --- 9. WRITE TO SERVING LAYER ---
print(f"Starting Serving Layer Write to: {serving_path}")

# For batch backfill/initial load:
df_dashboard.write \
    .format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .save(serving_path)
    
print("Success: Serving Layer updated.")