# 📥 01 – Data Acquisition

## Overview

This notebook pulls **Emerging Markets (EM) equity** and **macroeconomic time series data** from Bloomberg using the BQL API. 

### Data Sources:
- **EM ETFs**: 6 major emerging market equity ETFs representing different regions
- **Macro Factors**: 6 key macroeconomic indicators that typically drive EM performance
- **Time Period**: Last 3 years of daily data
- **Data Quality**: Forward-filled to handle missing values

### Purpose:
The collected data will be used for factor modeling analysis to understand how macroeconomic variables influence emerging market equity performance.

## 📦 Import Required Libraries

We need the following libraries:
- **pandas**: For data manipulation and analysis
- **bql**: Bloomberg Query Language API for data extraction
- **os**: For directory operations and file management

In [None]:
# Core data manipulation library
import pandas as pd

# Bloomberg Query Language API
import bql

# Operating system interface for file operations
import os

## 🔗 Bloomberg API Setup

Initialize the Bloomberg Query Language service and set up the date range for data extraction.

In [None]:
# Initialize Bloomberg Query Language service
bq = bql.Service()

# Set date range: Last 3 years to current date
date_range = bq.func.range('-3Y', '0D')

## 🌏 Emerging Markets ETF Data

Collect data for major Emerging Markets ETFs representing different geographic regions:

| ETF | Ticker | Region/Focus |
|-----|--------|--------------|
| EWZ | Brazil | Latin America's largest economy |
| INDA | India | South Asian technology and services hub |
| FXI | China | World's second-largest economy |
| EZA | South Africa | African markets representative |
| EWW | Mexico | NAFTA/USMCA integration |
| EIDO | Indonesia | Southeast Asian growth market |

In [None]:
# Define EM ETF universe with descriptive labels
em_assets = {
    'Brazil_EWZ': 'EWZ US Equity',      # iShares MSCI Brazil ETF
    'India_INDA': 'INDA US Equity',     # iShares MSCI India ETF  
    'China_FXI': 'FXI US Equity',       # iShares China Large-Cap ETF
    'SouthAfrica_EZA': 'EZA US Equity', # iShares MSCI South Africa ETF
    'Mexico_EWW': 'EWW US Equity',      # iShares MSCI Mexico ETF
    'Indonesia_EIDO': 'EIDO US Equity'  # iShares MSCI Indonesia ETF
}

# Initialize storage for EM equity data
em_data = {}

# Extract price data for each EM ETF
for label, ticker in em_assets.items():
    # Define data request: last price with forward fill for missing values
    data_item = bq.data.px_last(dates=date_range, fill='prev')
    request = bql.Request(ticker, data_item)
    
    # Execute Bloomberg query
    response = bq.execute(request)
    df = response[0].df()
    
    # Clean and standardize column names
    px_col = [col for col in df.columns if 'PX_LAST' in col.upper()][0]
    df = df[['DATE', px_col]]
    df.columns = ['date', label]
    df.set_index('date', inplace=True)
    
    # Store cleaned data
    em_data[label] = df
    print(f"✓ Downloaded {label}: {len(df)} observations")

# Combine all EM ETF data into single DataFrame
em_df = pd.concat(em_data.values(), axis=1)
print(f"\n📊 EM ETF Dataset: {em_df.shape[0]} rows × {em_df.shape[1]} columns")

## 📈 Macroeconomic Factors

Collect key macroeconomic indicators that typically drive EM performance:

| Factor | Ticker | Description | EM Impact |
|--------|--------|-------------|-----------|
| USD Index | DXY | Dollar strength vs. major currencies | **Negative**: Strong USD hurts EM |
| Oil (Brent) | CO1 | Global energy prices | **Mixed**: Positive for exporters, negative for importers |
| US 10Y Yield | USGG10YR | Risk-free rate benchmark | **Negative**: Higher yields reduce EM appeal |
| Fed Funds | FDTR | US monetary policy rate | **Negative**: Tighter US policy hurts EM flows |
| VIX | VIX | Market volatility/fear gauge | **Negative**: High volatility = risk-off sentiment |
| Copper | LMCADY | Industrial metals proxy | **Positive**: Commodity demand indicator |

In [None]:
# Define macroeconomic factors universe
macro_assets = {
    'USD_Index': 'DXY Curncy',          # US Dollar Index
    'Oil_Brent': 'CO1 Comdty',          # Brent Crude Oil Front Month
    'US_10Y_Yield': 'USGG10YR Index',   # US Generic Govt 10Y Yield
    'Fed_Funds': 'FDTR Index',          # Federal Funds Target Rate
    'VIX': 'VIX Index',                 # CBOE Volatility Index
    'Copper': 'LMCADY Comdty'           # LME Copper Grade A Cash
}

# Initialize storage for macro data
macro_data = {}

# Extract price data for each macro factor
for label, ticker in macro_assets.items():
    # Define data request: last price with forward fill
    data_item = bq.data.px_last(dates=date_range, fill='prev')
    request = bql.Request(ticker, data_item)
    
    # Execute Bloomberg query
    response = bq.execute(request)
    df = response[0].df()
    
    # Clean and standardize column names
    px_col = [col for col in df.columns if 'PX_LAST' in col.upper()][0]
    df = df[['DATE', px_col]]
    df.columns = ['date', label]
    df.set_index('date', inplace=True)
    
    # Store cleaned data
    macro_data[label] = df
    print(f"✓ Downloaded {label}: {len(df)} observations")

# Combine all macro data into single DataFrame
macro_df = pd.concat(macro_data.values(), axis=1)
print(f"\n📊 Macro Dataset: {macro_df.shape[0]} rows × {macro_df.shape[1]} columns")

## 🔄 Data Combination & Export

Merge EM equity and macro datasets, clean the data, and save for analysis in subsequent notebooks.

In [None]:
# Merge EM equity and macro datasets on date index
combined_df = pd.merge(em_df, macro_df, left_index=True, right_index=True, how='inner')

# Clean the combined dataset
combined_df = combined_df.sort_index().dropna()

print(f"📊 Combined Dataset Summary:")
print(f"   • Time Period: {combined_df.index.min()} to {combined_df.index.max()}")
print(f"   • Observations: {len(combined_df)} daily records")
print(f"   • Variables: {combined_df.shape[1]} total ({len(em_assets)} EM + {len(macro_assets)} Macro)")
print(f"   • Missing Values: {combined_df.isnull().sum().sum()}")

# Create data directory if it doesn't exist
os.makedirs('../data', exist_ok=True)

# Export to CSV for use in subsequent notebooks
output_path = '../data/combined_em_macro_data.csv'
combined_df.to_csv(output_path)
print(f"💾 Data saved to: {output_path}")

# Display first few rows as verification
print(f"\n📋 Data Preview:")
combined_df.head()