# Experience 6

**Name**:  Tina Tu
**Time Spent**: 15 hrs

**Note:**
- `import random` for simulations  
- Use `np.random.seed(301404)` as the random seed

In [3]:
import numpy as np
import pandas as pd
import itertools as it
from scipy.stats import chi2
import random

## (1) 

(a) Simulate a set of 1,000 SNPs with minor allele frequencies (in the entire population) randomly sampled to lie between 5% and 50% (uniform distribution).  Simulate genotypes for 200 (diploid) individuals at these SNPs.  How many SNPs are polymorphic in the first 50 individuals? How many SNPs are polymorphic in the first 100 individuals? How many SNPs are polymorphic in all 200 individuals? 
(b) Simulate a set of 1,000 SNPs with minor allele frequencies (in the entire population) randomly sampled to lie between 0.1% and 50% (uniform distribution).  Simulate genotypes for 200 (diploid) individuals at these SNPs.  How many SNPs are polymorphic in the first 50 individuals? How many SNPs are polymorphic in the first 100 individuals? How many SNPs are polymorphic in all 200 individuals? 
(c) Simulate a set of 1,000 SNPs with minor allele frequencies (in the entire population) randomly sampled to lie between 0.1% and 50% (with probability density inversely proportional to the minor allele frequency, e.g. 1% is 10x more likely than 10%; use rejection sampling as in Experience 5 Problem 4).  Simulate genotypes for 200 (diploid) individuals at these SNPs.  How many SNPs are polymorphic in the first 50 individuals? How many SNPs are polymorphic in the first 100 individuals? How many SNPs are polymorphic in all 200 individuals?


### Solution

In [29]:
# Set seed for reproducibility
np.random.seed(301404)

# Parameters
num_snps = 1000
num_individuals = 200

# Step 1: Simulate minor allele frequencies between 0.05 and 0.5
mafs = np.random.uniform(0.05, 0.5, size=num_snps)

# Step 2: Simulate genotypes (each individual has 2 alleles per SNP)
genotypes = np.array([
    np.random.binomial(2, maf, size=num_individuals)
    for maf in mafs
])

# Function to count polymorphic SNPs in a subset of individuals
def count_polymorphic_snps(genotype_matrix):
    return np.sum([len(np.unique(snp)) > 1 for snp in genotype_matrix])

# Step 3: Count polymorphic SNPs in subsets
poly_50 = count_polymorphic_snps(genotypes[:, :50])
poly_100 = count_polymorphic_snps(genotypes[:, :100])
poly_200 = count_polymorphic_snps(genotypes[:, :200])

print(f"Polymorphic SNPs in first 50 individuals: {poly_50}")
print(f"Polymorphic SNPs in first 100 individuals: {poly_100}")
print(f"Polymorphic SNPs in all 200 individuals: {poly_200}")

Polymorphic SNPs in first 50 individuals: 1000
Polymorphic SNPs in first 100 individuals: 1000
Polymorphic SNPs in all 200 individuals: 1000


In [30]:
np.random.seed(301404)

# Step 1: Simulate minor allele frequencies between 0.001 and 0.5
maf2 = np.random.uniform(0.001, 0.5, size=num_snps)

# Step 2: Simulate genotypes (each individual has 2 alleles per SNP)
genotypes_2 = np.array([
    np.random.binomial(2, maf, size=num_individuals)
    for maf in maf2
])

# Step 3: Count polymorphic SNPs in subsets
poly_50 = count_polymorphic_snps(genotypes_2[:, :50])
poly_100 = count_polymorphic_snps(genotypes_2[:, :100])
poly_200 = count_polymorphic_snps(genotypes_2[:, :200])

print(f"Polymorphic SNPs in first 50 individuals: {poly_50}")
print(f"Polymorphic SNPs in first 100 individuals: {poly_100}")
print(f"Polymorphic SNPs in all 200 individuals: {poly_200}")

Polymorphic SNPs in first 50 individuals: 989
Polymorphic SNPs in first 100 individuals: 995
Polymorphic SNPs in all 200 individuals: 1000


In [31]:
np.random.seed(301404)

# Step 1: Rejection sampling to get MAFs ∝ 1/maf
def sample_inverse_maf(n, lower, upper, max_attempts):
    mafs = []
    attempts = 0
    while len(mafs) < n and attempts < max_attempts:
        x = np.random.uniform(lower, upper)
        y = np.random.uniform(0, 1 / lower) 
        if y <= 1 / x:
            mafs.append(x)
        attempts += 1
    return np.array(mafs)

maf3 = sample_inverse_maf(num_snps, 0.001, 0.5, 100000)

# Step 2: Simulate genotypes
genotypes_3 = np.array([
    np.random.binomial(2, maf, size=num_individuals)
    for maf in maf3
])

poly_50 = count_polymorphic_snps(genotypes_3[:, :50])
poly_100 = count_polymorphic_snps(genotypes_3[:, :100])
poly_200 = count_polymorphic_snps(genotypes_3[:, :200])

print(f"Polymorphic SNPs in first 50 individuals: {poly_50}")
print(f"Polymorphic SNPs in first 100 individuals: {poly_100}")
print(f"Polymorphic SNPs in all 200 individuals: {poly_200}")

Polymorphic SNPs in first 50 individuals: 706
Polymorphic SNPs in first 100 individuals: 802
Polymorphic SNPs in all 200 individuals: 877


*comments*

In the first simulation (MAF 5%–50%), all SNPs are relatively common, so they remain polymorphic even in small samples. In the second scenario (MAF 0.1%–50%), some SNPs are very rare, leading to a few SNPs becoming monomorphic (i.e., only one allele observed) in smaller samples. In the third simulation, where rare variants are even more enriched, a substantial number of SNPs are monomorphic. Overall, SNPs with higher minor allele frequencies (common variants) are almost always polymorphic, meaning both alleles are consistently observed even in small samples (e.g., 50 individuals). In contrast, rare variants (low MAF) are often monomorphic in small samples and require much larger sample sizes to be observed.

## (2) 

Simulate a set of 100 SNPs with minor allele frequencies (in the entire population) randomly sampled to lie between 0.1% and 1% (uniform distribution).  Simulate genotypes for 10,000 individuals at these SNPs.  (Note that the real HapMap3 data set used in this course contains only common SNPs.)  Assign all 100 SNPs as causal SNPs.  Simulate quantitative phenotypes for the 10,000 individuals by assuming that causal SNPs have effect size per normalized genotype = ±0.01, where normalized genotype is defined using the MAF in the entire population (not the sample MAF in the finite sample); the probability of having a positive effect sign (i.e. +0.01 rather than −0.01) varies linearly from 0.5 for MAF=1% SNPs to 1 for MAF=0.1% SNPs; and normally distributed environmental noise (unrelated to the 100 SNPs) with variance 0.99 is added to each phenotype.  
(a) Viewing the 100 SNPs as lying in a single simulated gene, use the Fixed threshold burden test, at various choices of MAF threshold (for the MAF in the finite sample), to test the gene for association to phenotype. Which MAF threshold produces the most significant test statistic?
(b) use the Madsen & Browning Weighted test to test the gene for association to phenotype.  Are results more significant than the Fixed threshold test?


### Solution

In [8]:
# Generate random effect sizes
def sample_normal(n, std):
    samples = []
    while len(samples) < n:
        x = np.random.uniform(-10, 10)
        accept_prob = np.exp(-x**2 / 2)  
        if np.random.uniform(0, 1) < accept_prob:
            samples.append(x * std)
    return np.array(samples)

def sample_maf(genotypes):
    return np.mean(genotypes, axis=1) / 2

def burden_t_stat(burden, phenotype):
    if len(burden) != len(phenotype) or len(burden) == 0:
        return 0.0
    r = np.corrcoef(burden, phenotype)[0, 1]
    return 0.0 if np.isnan(r) else len(burden) * r**2

In [49]:
np.random.seed(301404)

num_snps = 100
num_individuals = 10000

# Sample MAFs uniformly between 0.001 and 0.01
mafs = np.random.uniform(0.001, 0.01, size=num_snps)

# Simulate genotypes (0, 1, 2 minor alleles)
genotypes = np.array([
    np.random.binomial(2, maf, size=num_individuals)
    for maf in mafs
]) 

# Normalized genotype
norm_genotypes = np.array([
    (genotypes[j] - 2 * mafs[j]) / np.sqrt(2 * mafs[j] * (1 - mafs[j]))
    for j in range(num_snps)
])

# Step 4: Assign effect sizes: ±0.01 with P(+) from 0.5 (at MAF=1%) to 1 (at MAF=0.1%)
prob_positive = 0.5 + (0.01 - mafs) / 0.009 * 0.5
effect_signs = np.where(np.random.uniform(size=num_snps) < prob_positive, 1, -1)
effect_sizes = 0.01 * effect_signs  

# Step 5: Simulate phenotypes
genetic_component = np.dot(effect_sizes, norm_genotypes)  
environmental_effect = np.random.normal(0, np.sqrt(0.99), size=num_individuals)
phenotypes = genetic_component + environmental_effect

In [51]:
# Define candidate MAF thresholds
thresholds = [0.0005, 0.001, 0.005, 0.01, 0.02] 
chi_stats = []

for threshold in thresholds:
    mafs_sample = sample_maf(genotypes)
    selected_snps = genotypes[mafs_sample < threshold]

    num_selected_snps = selected_snps.shape[0]  # Number of SNPs selected

    if num_selected_snps > 0:
        burden_scores = np.sum(selected_snps, axis=0, dtype=np.float64)

        # Get both t_stat and chi-square equivalent
        chi_equiv = burden_t_stat(burden_scores, phenotypes)
        
        print(f"Threshold {threshold:.4f}: {num_selected_snps} SNPs selected, chi-square statistic = {chi_equiv:.4f}")
        chi_stats.append(chi_equiv)
    else:
        chi_stats.append(np.nan)
        print(f"Threshold {threshold:.4f}: no SNPs below MAF threshold.")

# Find most significant
best_threshold = thresholds[np.nanargmax(chi_stats)]
print(f"\nMost significant MAF threshold: {best_threshold}")

Threshold 0.0005: no SNPs below MAF threshold.
Threshold 0.0010: no SNPs below MAF threshold.
Threshold 0.0050: 46 SNPs selected, chi-square statistic = 37.1334
Threshold 0.0100: 99 SNPs selected, chi-square statistic = 9.8027
Threshold 0.0200: 100 SNPs selected, chi-square statistic = 13.4206

Most significant MAF threshold: 0.005


In [53]:
weights = 1 / np.sqrt(mafs * (1 - mafs) )

# Step 3: compute weighted burden score
weighted_genotypes = weights[:, np.newaxis] * genotypes  
burden_scores_weighted = np.sum(weighted_genotypes, axis=0)

# Step 4: run burden test
chi_weighted = burden_t_stat(burden_scores_weighted, phenotypes)

print(f"Weighted Burden Test: chi-square statistic (weighted) = {chi_weighted}")

Weighted Burden Test: chi-square statistic (weighted) = 23.173854355756475


*comments*

Based on the results, the 0.005 MAF threshold performed best with the most significant test statistic in the fixed threshold burden test. When the threshold was larger than 0.005, more SNPs were included. Hence, we were mixing SNPs with very different effect size signs, and collapsed them would weaken the association because the effects were cancelled out. When the threshold was smaller than 0.05, we included fewer SNPs, so the signal strength decreased because the burden test lost power. 

The Madsen & Browning weighted burden test weighted SNPs inversely proportional to their MAF. In this simulation, SNPs rarer than ~0.5% MAF tend to have more consistently positive effects, so giving more emphasis to very rare SNPs would theoretically yield a higher test statistic. In this case, the number is slightly lower  than the best fixed threshold (0.005) but higher than 0.01 and 0.02 thresholds. 

## (3)  

Consider a population with 1,000 haploid individuals.  Perform forward simulations in which the allele frequency of a neutral new mutation in the population (initial allele frequency = 0.1%) is simulated for 100 subsequent generations (in each generation, sample each individual’s haploid genotype using the allele frequency from the previous generation).  Repeat 1,000 times.
(a) What proportion of the time does the new mutation become relatively common (MAF ≥ 1%)?
(b) Perform simulations with a selection coefficient of s = −0.01 (in each generation, sample each individual’s haploid genotype from the previous generation’s haploid individuals such that individuals with the mutation are 1+s times as likely to be sampled).  Repeat 1,000 times.  
What proportion of the time does the new mutation become relatively common (MAF ≥ 1%)?


### Solution

In [27]:
np.random.seed(301404)

# Parameters
N = 1000
generations = 100
af = 0.001

# Neutral
def simulate_one_run(N, generations, af):
    freq = af
    for _ in range(generations):
        # Sample number of derived alleles in next generation
        count = np.random.binomial(N, freq)
        freq = count / N
    return freq

# Run simulation 1,000 times
n_runs = 1000
final_freqs = [simulate_one_run(N, generations, af) for _ in range(n_runs)]

# Count how many runs end with freq >= 0.01 (1%)
num_common = sum(f >= 0.01 for f in final_freqs)
proportion = num_common / n_runs

print(f"Proportion where allele (neutral mutation) becomes common (MAF ≥ 1%): {proportion}")

# Negative 
np.random.seed(301404)
s = -0.01   

def selected_forward(N, generations, af, s):
    freq = af
    for _ in range(generations):
        n_mut = int(freq * N)
        n_wt = N - n_mut

        # Compute total fitness weights
        total_fitness = n_mut * (1 + s) + n_wt
        prob_mut = (n_mut * (1 + s)) / total_fitness

        # Sample next generation
        n_mut_next = np.random.binomial(N, prob_mut)
        freq = n_mut_next / N

        if freq == 0:
            break 

    return freq

# Run 1000 simulations
final_freqs = [selected_forward(N, generations, af, s) for _ in range(n_runs)]

# Count how many runs ended with MAF ≥ 1%
num_neg = sum(f >= threshold for f in final_freqs)
proportion_neg = num_neg / n_runs

print(f"Proportion where allele (negative mutation) becomes common (MAF ≥ 1%): {proportion_neg}")

Proportion where allele (neutral mutation) becomes common (MAF ≥ 1%): 0.016
Proportion where allele (negative mutation) becomes common (MAF ≥ 1%): 0.007


*comments*

If the new mutation is neutral, MAF is only influenced by genetic drift (random fluctuations). Thus, ~1.9% of neutral mutations managed to drift up to ≥1% MAF. However, when the selection coefficient of s = −0.01, the new variant is less likely to drift up (negative selection consistently pushes allele frequencies down). 

## (4)

(a) Consider a set of 100 linked SNPs in the 112 CEU individuals. (For example, you could choose the first 100 SNPs). Simulate quantitative phenotypes for CEU individuals by assuming that all 100 SNPs are causal, and effect sizes per normalized genotype are sampled from a normal distribution with variance 0.015 for even SNPs (0, 2, …) and a normal distribution with variance 0.005 for odd SNPs (1, 3, …).  What is the true functional enrichment of even SNPs as compared to all 100 SNPs?  Does the (average χ2 association statistic – 1) for even SNPs as compared to all 100 SNPs accurately reflect the true functional enrichment?  Why or why not? 
(b) Repeat the simulation in (a) using a set of 100 unlinked SNPs in the 112 CEU individuals.  (For example, you could choose every 125th SNP of the first 12,500 SNPs). Does the (average χ2 association statistic – 1) for even SNPs as compared to all 100 SNPs accurately reflect the true functional enrichment?  Why or why not?


### Solution

In [30]:
# please set the path to your data directory here
path = "./EPI511/"

# please use the following function (or something like it) to read files
def pname(name):
    '''Prepend the path to the filename'''
    return path + '/' + name

def popen(name):
    '''Open file in the path'''
    return open(pname(name))

def read_geno(file):
    '''Reads a geno file into a masked numpy matrix'''
    return(np.genfromtxt(
        file,               # the file
        dtype='uint8',      # read the data in as 1-byte integers
        delimiter=1,        # 1-byte width data
        missing_values=9,   # 9 indicates missing data
        usemask=True        # return a masked array
    ))

# Reads a slice of a geno file into a numpy matrix
def read_geno_pop_islice(pop, *args):
    f = popen(pop + '.geno') 
    s = it.islice(f, *args)  
    return read_geno(s)

# Calculate allele frequency
def calculate_af(geno):
    return np.ma.mean(geno, axis=1).filled(-1) / 2

# Normalize geno matrix
def normalize_geno(geno, p=None):
    if p is None: p = calculate_af(geno)
    return ( (geno - (2 * p)[:, np.newaxis]) / np.sqrt(2 * p * (1 - p))[:, np.newaxis]).filled(0)

# Armitage trend test
def armitage_trend_test(genotype, phenotype):
    if np.ma.isMaskedArray(genotype):
        valid = ~genotype.mask
    else:
        valid = np.ones_like(genotype, dtype=bool)

    valid_genotype = genotype[valid]
    valid_phenotype = phenotype[valid] 

    if len(valid_genotype) == 0:
        return np.nan

    r = np.corrcoef(valid_genotype, valid_phenotype)[0, 1]
    if np.isnan(r):
        return np.nan

    N = len(valid_genotype)
    return N * r**2

In [55]:
np.random.seed(301404)
CEU_geno = read_geno_pop_islice('CEU', 0, 100, 1)
num_snps, num_individuals = CEU_geno.shape

# Normalize genotypes (per SNP)
CEU_geno_norm = normalize_geno(CEU_geno)

# Simulate effect sizes
effect_sizes = np.zeros(num_snps)
even_idx = np.arange(0, num_snps, 2)
odd_idx = np.arange(1, num_snps, 2)

effect_sizes[even_idx] = sample_normal(len(even_idx), np.sqrt(0.015))
effect_sizes[odd_idx] = sample_normal(len(odd_idx), np.sqrt(0.005))

# Simulate phenotype
phenotype = np.dot(effect_sizes, CEU_geno_norm)  
genetic_variance = np.var(phenotype)

# True functional enrichment
even_genetic_var = np.var(np.dot(effect_sizes[even_idx], CEU_geno_norm[even_idx, :]))
heritability_even= even_genetic_var / genetic_variance
enrichment = heritability_even / (len(even_idx) / num_snps)
print(f"True functional enrichment (even SNPs/all SNPs): {enrichment}")

# Step 5: Association test using Armitage trend test
chi2_stats = []
for i in range(num_snps):
    x = CEU_geno_norm[i]
    chi2 = armitage_trend_test(x, phenotype)
    chi2_stats.append(chi2)

chi2_stats = np.array(chi2_stats)

# Step 6: Compare average χ² - 1
mean_chi2_all = np.mean(chi2_stats)
mean_chi2_even = np.mean(chi2_stats[even_idx])
enrichment_chi2 = (mean_chi2_even - 1) / (mean_chi2_all - 1)

print(f"Enrichment estimate from χ²: {enrichment_chi2}")

True functional enrichment (even SNPs/all SNPs): 1.5687145221945467
Enrichment estimate from χ²: 1.1212114010523564


In [35]:
CEU_geno_2 = read_geno_pop_islice('CEU', 0, 12500, 125)
num_snps, num_individuals = CEU_geno.shape

# Step 1: Normalize genotypes (per SNP)
CEU_norm_2 = normalize_geno(CEU_geno_2)

# Step 2: Simulate phenotype
phenotype_2 = np.dot(effect_sizes, CEU_norm_2)  
genetic_var_2 = np.var(phenotype_2)

# Step 4: True functional enrichment
even_var_2 = np.var(np.dot(effect_sizes[even_idx], CEU_norm_2[even_idx, :]))

heritability_even= even_var_2 / genetic_var_2
enrichment = heritability_even / (len(even_idx) / num_snps)
print(f"True functional enrichment (even SNPs/all SNPs): {enrichment}")

chi2_unlinked = []
for i in range(num_snps):
    x = CEU_norm_2[i]
    chi2 = armitage_trend_test(x, phenotype_2)
    chi2_unlinked.append(chi2)

chi2_unlinked = np.array(chi2_unlinked)

# Compare average χ² - 1
mean_chi2_all = np.mean(chi2_unlinked)
mean_chi2_even = np.mean(chi2_unlinked[even_idx])
enrichment_unlinked = (mean_chi2_even - 1) / (mean_chi2_all - 1)

print(f"Enrichment estimate from χ²: {enrichment_unlinked}")

True functional enrichment (even SNPs/all SNPs): 1.6832496877899932
Enrichment estimate from χ²: 1.6473640500638613


*comments*

The linked SNPs show a true functional enrichment of approximately 1.568, but the enrichment estimated from χ² statistics is lower (~1.121). This discrepancy likely arises from the small sample size (n = 112 CEU individuals) and residual linkage disequilibrium (LD) between SNPs. When SNPs are in LD, their χ² statistics reflect not only their own effects but also the effects of neighboring variants, causing a dilution of the true causal signal. Consequently, LD leads to an underestimation of enrichment when relying on χ²-based estimates compared to variance-based enrichment.

For unlinked SNPs, the true enrichment was estimated at 1.683, closely matching the enrichment estimated from χ² (~1.647), suggesting that unlinked SNPs provide more accurate enrichment estimation. 

## (5)

Consider a set of 100 linked SNPs in the 112 CEU individuals.  (For example, you could choose the first 100 SNPs).  Compute the LD score (sum of r2 across the set of 100 SNPs) of each of the 100 SNPs.  Label the 50 SNPs with highest LD score as “high-LD” SNPs, and label the 50 SNPs with lowest LD score as “low-LD” SNPs.  Simulate quantitative phenotypes for CEU individuals by assuming that all 100 SNPs are causal, and effect sizes per normalized genotype are sampled from a normal distribution with variance 0.015 for low-LD SNPs and a normal distribution with variance 0.005 for high-LD SNPs. What is the true heritability enrichment of low-LD SNPs as compared to all 100 SNPs?  Does the (average χ2 association statistic – 1) for low-LD SNPs as compared to all 100 SNPs accurately reflect the true heritability enrichment?  Why or why not? 

### Solution

In [45]:
def compute_r2(snp1, snp2):
    gA = snp1.astype(float)
    gB = snp2.astype(float)
    
    if np.std(gA) == 0 or np.std(gB) == 0:
        return np.nan

    # Mean genotype
    gA_mean = np.mean(gA)
    gB_mean = np.mean(gB)

    var_a = np.var(gA)
    var_b = np.var(gB)

    # Compute r^2
    numerator = (np.mean(gA * gB) - gA_mean * gB_mean) ** 2
    denominator = var_a * var_b
    
    if denominator == 0:
        return np.nan
    return numerator / denominator


def compute_all_ld_scores(geno_data):
    num_snps = geno_data.shape[0]
    ld_scores = np.zeros(num_snps)

    for snp_idx in range(num_snps):
        # Compute LD score for each SNP
        target_snp = geno_data[snp_idx, :]
        
        r_squared_values = np.array([
            compute_r2(target_snp, geno_data[i, :]) for i in range(num_snps)])
        
        # Set self-correlation to 1
        r_squared_values[snp_idx] = 1.0
        
        # Sum all r^2 values
        ld_scores[snp_idx] = np.nansum(r_squared_values)

    return ld_scores

In [61]:
ld_scores = compute_all_ld_scores(CEU_geno_norm)  
sorted_indices = np.argsort(ld_scores)  

# Label low-LD and high-LD SNPs
low_ld_indices = sorted_indices[:50]    # 50 SNPs with lowest LD scores
high_ld_indices = sorted_indices[-50:]  # 50 SNPs with highest LD scores

np.random.seed(301404)

# Initialize effect sizes
effect_sizes = np.zeros(num_snps)

effect_sizes[low_ld_indices] = sample_normal(len(low_ld_indices), np.sqrt(0.015))
effect_sizes[high_ld_indices] = sample_normal(len(high_ld_indices), np.sqrt(0.005))

# Simulate phenotype (pure genetic effect)
phenotype_LD = np.dot(effect_sizes, CEU_geno_norm) 

genetic_var_LD = np.var(phenotype_LD)

low_ld_var = np.var(np.dot(effect_sizes[low_ld_indices], CEU_geno_norm[low_ld_indices, :]))

heritability_LD = low_ld_var / genetic_var_LD

enrichment = heritability_LD / (len(low_ld_indices) / num_snps)
print(f"True functional enrichment for low LD SNPs: {enrichment}")

chi2_LD = []
for i in range(num_snps):
    x = CEU_geno_norm[i]
    chi2 = armitage_trend_test(x, phenotype_LD)
    chi2_LD.append(chi2)

chi2_LD = np.array(chi2_LD)

# Compare average χ² - 1
mean_chi2_all = np.mean(chi2_LD)
mean_chi2_low_ld = np.mean(chi2_LD[low_ld_indices])
enrichment_LD = (mean_chi2_low_ld - 1) / (mean_chi2_all - 1)

print(f"Enrichment estimate from χ²: {enrichment_LD}")

True functional enrichment for low LD SNPs: 1.5375945766566217
Enrichment estimate from χ²: 1.2567029028812229


*comments*

The low-LD SNPs exhibit a true functional enrichment of approximately 1.538, but the enrichment estimated from χ² statistics is slightly lower (~1.257). Given the small sample size (112 individuals), the χ² statistics are expected to be noisy. Moreover, despite separating SNPs into high- and low-LD groups, there may still be residual LD within the low-LD set. Additionally, the variance around the effect sizes (~0.015) contributes to random variability. As a result, the enrichment estimate based on χ² is slightly attenuated relative to the true value. Nevertheless, the enrichment calculated from χ² here is closer to the enrichment calculated from χ² in Q4. 