# 01 — Download & Exploratory Data Analysis (EDA)

**Goal:**
- Download daily price data for sector ETFs.
- Compute daily log returns.
- Visualize price trends, volatility, and correlations.
- Export summary statistics for later modeling steps.

**Tickers:** `XLK, XLF, XLI, XLV`
Time Period: **2013–2023**


In [5]:
%load_ext autoreload
%autoreload 2

import os
import numpy as np
import pandas as pd

from quantcopula.data import get_prices
from quantcopula.plotting import plot_prices, plot_returns, plot_correlation_heatmap

TICKERS = ["XLK", "XLF", "XLI", "XLV"]
START_DATE = "2013-01-01"
END_DATE = "2023-12-31"

os.makedirs("data/processed", exist_ok=True)
os.makedirs("figures", exist_ok=True)


[autoreload of six failed: Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\Lib\site-packages\IPython\extensions\autoreload.py", line 276, in check
    superreload(m, reload, self.old_objects)
  File "C:\Users\Admin\anaconda3\Lib\site-packages\IPython\extensions\autoreload.py", line 500, in superreload
    update_generic(old_obj, new_obj)
  File "C:\Users\Admin\anaconda3\Lib\site-packages\IPython\extensions\autoreload.py", line 397, in update_generic
    update(a, b)
  File "C:\Users\Admin\anaconda3\Lib\site-packages\IPython\extensions\autoreload.py", line 330, in update_class
    old_obj = getattr(old, key)
              ^^^^^^^^^^^^^^^^^
  File "C:\Users\Admin\anaconda3\Lib\site-packages\six.py", line 98, in __get__
    setattr(obj, self.name, result)  # Invokes __set__.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'filter'
]
[autoreload of packaging.utils failed: Traceback (most recent call last):
  File "C:\Users\Admin\an

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [None]:
prices = get_prices(TICKERS, START_DATE, END_DATE, cache_path="data/raw/prices.parquet", auto_adjust=False)
prices = prices.dropna(how="any")
prices.tail()


[autoreload of dateutil.tz._factories failed: Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\Lib\site-packages\IPython\extensions\autoreload.py", line 276, in check
    superreload(m, reload, self.old_objects)
  File "C:\Users\Admin\anaconda3\Lib\site-packages\IPython\extensions\autoreload.py", line 500, in superreload
    update_generic(old_obj, new_obj)
  File "C:\Users\Admin\anaconda3\Lib\site-packages\IPython\extensions\autoreload.py", line 397, in update_generic
    update(a, b)
  File "C:\Users\Admin\anaconda3\Lib\site-packages\IPython\extensions\autoreload.py", line 365, in update_class
    update_instances(old, new)
  File "C:\Users\Admin\anaconda3\Lib\site-packages\IPython\extensions\autoreload.py", line 323, in update_instances
    object.__setattr__(ref, "__class__", new)
TypeError: can't apply this __setattr__ to _TzSingleton object
]
[autoreload of pandas._testing failed: Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\Lib\site-pack

## Data Snapshot
We expect a DateTime index and one column per ticker (Adj Close).
Missing values are dropped to align trading calendars across ETFs.


In [None]:
log_returns = np.log(prices / prices.shift(1)).dropna(how="any")
log_returns.tail()


## Visualizations
We save figures to `figures/` so they can be embedded in the README or reports.


In [None]:
plot_prices(prices, out_path="figures/prices.png", title="Sector ETF Prices (2013–2023)")
plot_returns(log_returns, out_path="figures/log_returns.png")
plot_correlation_heatmap(log_returns, out_path="figures/corr_heatmap.png")


In [None]:
plot_prices(prices, out_path="figures/prices.png", title="Sector ETF Prices (2013–2023)")
plot_returns(log_returns, out_path="figures/log_returns.png")
plot_correlation_heatmap(log_returns, out_path="figures/corr_heatmap.png")


In [None]:
summary = log_returns.describe().T
summary["skew"] = log_returns.skew()
summary["kurtosis"] = log_returns.kurtosis()
summary.round(4).to_csv("data/processed/summary_stats.csv")
summary.round(4)


## Key Takeaways

- **Volatility:** Standard deviations differ across sectors (e.g., Technology typically higher).
- **Tail risk:** Positive kurtosis indicates **fat tails**, supporting the use of **t-distribution margins**.
- **Dependence:** Correlation heatmap reveals co-movement strength between sectors, important for copula modeling.


In [None]:
START_ALT = "2020-01-01"
END_ALT = END_DATE

prices_alt = get_prices(
    TICKERS,
    START_ALT,
    END_ALT,
    cache_path="data/raw/prices.parquet",
    auto_adjust=False,
).dropna(how="any")

log_returns_alt = np.log(prices_alt / prices_alt.shift(1)).dropna(how="any")

pd.DataFrame({
    "Std (2013–2023)": log_returns.std(),
    "Std (2020–present)": log_returns_alt.std(),
}).round(4)


**Next:** `02_fit_margins_t_and_PIT.ipynb`
Fit Student-t marginals per asset and transform returns → uniforms (PIT).
