<img src="https://theaiengineer.dev/tae_logo_gw_flatter.png" width=35% align=right>

# Python Primer for Machine & Deep Learning
## Pandas Basics

**&copy; Dr. Yves J. Hilpisch**

AI-Powered by GPT-5

pandas gives you labeled, column‑oriented containers (Series/DataFrame) and a rich vocabulary for cleaning, reshaping, aggregating, joining, and plotting.

In [None]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8')

### Create and inspect

In [None]:
idx = pd.date_range('2025-01-01', periods=5, freq='D')
df = pd.DataFrame({'price':[100,101.5,99.8,102.2,103.0], 'volume':[10,12,9,15,11]}, index=idx)
df.head(2), df.dtypes

### Select and filter

In [None]:
df.loc['2025-01-02':'2025-01-04', ['price','volume']]

In [None]:
df.iloc[0:3, 0]

In [None]:
df[df['price'] > 101]

### Vectorize and transform

In [None]:
df = df.assign(ret=df['price'].pct_change())
df['roll3'] = df['price'].rolling(3, center=True, min_periods=1).mean()
df

### Groupby and aggregate

In [None]:
df = df.copy(); df['category'] = ['A','B','A','B','A']
df.groupby('category').agg(price_mean=('price','mean'), vol_sum=('volume','sum'))

### Time series and plotting

In [None]:
idx = pd.date_range('2025-01-01', periods=120, freq='D')
price = 100 + np.cumsum(np.random.default_rng(0).normal(0,1.0,len(idx)))
vol = np.random.default_rng(1).integers(8,20,len(idx))
ts = pd.DataFrame({'price':price,'volume':vol}, index=idx)
ax = ts['price'].plot(figsize=(7,4), lw=1.6, color='#1f77b4', label='price')
ts['price'].rolling(14, min_periods=1).mean().plot(ax=ax, lw=2.0, color='#ff7f0e', label='14D mean')
ax.legend(); ax.grid(alpha=0.3); plt.show()

### Merging and joining

In [None]:
quotes = pd.DataFrame({
  'sym':['AAPL','MSFT','AAPL','MSFT'],
  'ts': pd.to_datetime(['2025-01-01','2025-01-01','2025-01-02','2025-01-02']),
  'price':[100.0, 300.0, 101.5, 305.0]
})
trades = pd.DataFrame({
  'sym':['AAPL','MSFT','AAPL'],
  'ts': pd.to_datetime(['2025-01-01','2025-01-02','2025-01-03']),
  'qty':[10,5,12]
})
tx = pd.merge(trades, quotes, on=['sym','ts'], how='left')
tx.assign(value=tx['qty']*tx['price'])

### Reshaping: pivot and melt

In [None]:
long = pd.DataFrame({
  'date': pd.to_datetime(['2025-01-01','2025-01-01','2025-01-02','2025-01-02']),
  'metric':['price','volume','price','volume'],
  'value':[100.0,10,101.5,12]
})
wide = long.pivot(index='date', columns='metric', values='value')
wide

In [None]:
tidy = wide.reset_index().melt(id_vars='date', var_name='metric', value_name='value')
tidy.sort_values(['date','metric']).head()

### Distributions: histogram and boxplot

In [None]:
ret = ts['price'].pct_change().dropna()
ax = ret.plot(kind='hist', bins=30, figsize=(7,4), color='#1f77b4', edgecolor='white', alpha=0.85)
ax.set(title='Histogram of Daily Returns', xlabel='return', ylabel='frequency'); plt.show()

In [None]:
ts2 = ts.assign(ret=ts['price'].pct_change(), weekday=ts.index.day_name())
ax = ts2.boxplot(column='ret', by='weekday', grid=False, figsize=(8,4))
ax.set_title('Returns by Weekday'); ax.set_xlabel('weekday'); ax.set_ylabel('return'); plt.suptitle(''); plt.show()

## Exercises
1. Add a z‑score column for price (per column) and plot it.
2. Merge a small trades DataFrame with the quotes above and compute trade values; explain the NaNs.
3. Resample the time series to weekly means and overlay a 4‑week rolling mean.

<img src="https://theaiengineer.dev/tae_logo_gw_flatter.png" width=35% align=right>