# Toy (but realistic) Mendelian Randomization in Google Colab

This notebook simulates **summary-statistic** data for genetic instruments and runs common 2-sample MR estimators:

- Wald ratio (per SNP)
- **IVW** (inverse-variance weighted)
- **MR-Egger**
- Weighted median
- Sensitivity: heterogeneity (Q), leave-one-out

It also includes plots (scatter, forest, funnel).

> No external data needed, so it works offline and is reproducible for a class.

In [None]:
#@title Setup
!pip -q install numpy pandas scipy statsmodels matplotlib

import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt

np.random.seed(123)

## 1) Simulate instruments

We simulate SNP effects on an exposure (**βX, SE_X**) and an outcome (**βY, SE_Y**).

- True causal effect: **θ**
- Directional pleiotropy: optional (Egger intercept ≠ 0)

In [None]:
#@title Simulate MR summary stats
K = 30                 # number of instruments
theta_true = 0.25      # true causal effect

# instrument strength
beta_x = np.random.normal(0.06, 0.02, size=K)          # exposure effects
se_x   = np.random.uniform(0.01, 0.02, size=K)

# pleiotropy: set to 0 for none; e.g., 0.02 for directional pleiotropy
pleiotropy_mean = 0.00
pleiotropy = np.random.normal(pleiotropy_mean, 0.03, size=K)

# outcome effects
se_y = np.random.uniform(0.015, 0.03, size=K)
beta_y = theta_true * beta_x + pleiotropy + np.random.normal(0, se_y, size=K)

snps = [f"rs{100000+i}" for i in range(K)]
dat = pd.DataFrame({
    "SNP": snps,
    "beta_x": beta_x, "se_x": se_x,
    "beta_y": beta_y, "se_y": se_y,
})
dat.head()

## 2) Wald ratios (per SNP)

In [None]:
#@title Wald ratios
dat["ratio"] = dat["beta_y"] / dat["beta_x"]
dat["se_ratio"] = np.sqrt((dat["se_y"]**2)/(dat["beta_x"]**2) + (dat["beta_y"]**2 * dat["se_x"]**2)/(dat["beta_x"]**4))
dat["w_ratio"] = 1 / dat["se_ratio"]**2
dat[["SNP","ratio","se_ratio"]].head()

## 3) IVW (fixed-effects)

We use the common formulation as a weighted regression of βY on βX with intercept fixed to 0.

In [None]:
#@title IVW estimate
w = 1 / (dat["se_y"]**2)
x = dat["beta_x"].to_numpy()
y = dat["beta_y"].to_numpy()

theta_ivw = np.sum(w * x * y) / np.sum(w * x * x)
se_ivw = np.sqrt(1 / np.sum(w * x * x))
z_ivw = theta_ivw / se_ivw
p_ivw = 2 * stats.norm.sf(abs(z_ivw))

print(f"IVW theta = {theta_ivw:.3f} (SE {se_ivw:.3f}), P = {p_ivw:.3g}")

## 4) MR-Egger

Weighted regression with an intercept (tests directional pleiotropy).

In [None]:
#@title MR-Egger
X = sm.add_constant(dat["beta_x"])
model = sm.WLS(dat["beta_y"], X, weights=1/(dat["se_y"]**2)).fit()
intercept, theta_egger = model.params["const"], model.params["beta_x"]
se_int, se_egger = model.bse["const"], model.bse["beta_x"]

p_int = model.pvalues["const"]
p_egger = model.pvalues["beta_x"]

print(f"Egger intercept = {intercept:.3f} (SE {se_int:.3f}), P = {p_int:.3g}")
print(f"Egger theta     = {theta_egger:.3f} (SE {se_egger:.3f}), P = {p_egger:.3g}")

## 5) Weighted median

A simple implementation using ratio estimates and their inverse-variance weights.

In [None]:
#@title Weighted median
def weighted_median(values, weights):
    order = np.argsort(values)
    v = np.asarray(values)[order]
    w = np.asarray(weights)[order]
    cw = np.cumsum(w) / np.sum(w)
    return v[np.searchsorted(cw, 0.5)]

theta_wmed = weighted_median(dat["ratio"].to_numpy(), dat["w_ratio"].to_numpy())
print(f"Weighted median theta ≈ {theta_wmed:.3f}")

## 6) Heterogeneity (Cochran's Q) and leave-one-out

Large heterogeneity suggests pleiotropy or invalid instruments.

In [None]:
#@title Heterogeneity Q (IVW) + leave-one-out IVW
resid = y - theta_ivw * x
Q = np.sum(w * resid**2)
df = K - 1
p_Q = stats.chi2.sf(Q, df=df)
print(f"Cochran Q = {Q:.2f} (df={df}), P = {p_Q:.3g}")

loo = []
for i in range(K):
    mask = np.ones(K, dtype=bool); mask[i] = False
    w_i = w[mask]; x_i = x[mask]; y_i = y[mask]
    th = np.sum(w_i * x_i * y_i) / np.sum(w_i * x_i * x_i)
    se = np.sqrt(1 / np.sum(w_i * x_i * x_i))
    loo.append((dat.loc[i,"SNP"], th, se))
loo_df = pd.DataFrame(loo, columns=["SNP","theta_ivw_loo","se"])
loo_df.head()

## 7) Plots

Scatter (βX vs βY), forest of ratios, funnel plot of ratios vs precision.

In [None]:
#@title Scatter plot
plt.figure(figsize=(5,5))
plt.scatter(dat["beta_x"], dat["beta_y"], s=40, alpha=0.8)
xx = np.linspace(dat["beta_x"].min()*0.9, dat["beta_x"].max()*1.1, 50)
plt.plot(xx, theta_ivw*xx, linestyle="--", label="IVW (no intercept)")
plt.plot(xx, intercept + theta_egger*xx, linestyle=":", label="MR-Egger")
plt.xlabel("SNP effect on exposure (beta_x)")
plt.ylabel("SNP effect on outcome (beta_y)")
plt.title("MR scatter plot")
plt.legend()
plt.show()

In [None]:
#@title Forest plot (Wald ratios)
dfp = dat.sort_values("ratio").reset_index(drop=True)
ypos = np.arange(len(dfp))

plt.figure(figsize=(6,8))
plt.errorbar(dfp["ratio"], ypos, xerr=1.96*dfp["se_ratio"], fmt="o")
plt.axvline(theta_ivw, linestyle="--", label="IVW")
plt.axvline(theta_wmed, linestyle=":", label="Weighted median")
plt.yticks(ypos, dfp["SNP"])
plt.xlabel("Wald ratio (beta_y / beta_x)")
plt.title("Per-SNP causal estimates")
plt.legend()
plt.gca().invert_yaxis()
plt.show()

In [None]:
#@title Funnel plot
precision = 1 / dfp["se_ratio"]
plt.figure(figsize=(5,5))
plt.scatter(dfp["ratio"], precision, s=40, alpha=0.8)
plt.axvline(theta_ivw, linestyle="--")
plt.xlabel("Wald ratio")
plt.ylabel("Precision (1/SE)")
plt.title("Funnel plot")
plt.show()

## Optional: use real summary data via IEU OpenGWAS

If you want a *real-data* demo later, consider using the **IEU OpenGWAS API** (via Python wrapper `ieugwaspy`). This is great for showing end-to-end MR workflows without distributing large files. See the official API and Python wrapper docs.


In [None]:
#@title (Optional) Install and import ieugwaspy
# Uncomment if you want to try real-data queries.
# !pip -q install ieugwaspy
# from ieugwaspy import gwas

print("Optional section - not executed by default.")