# Tutorial: eQTL Analysis with JaxQTL and TensorQTL using `cellink`

This tutorial demonstrates how to perform eQTL analysis using external tools JaxQTL and TensorQTL through the `cellink` package. The `cellink` package provides a unified interface to these powerful QTL mapping tools, making it easier to perform comprehensive genetic analyses on single-cell datasets. These tools provide powerful statistical methods for detecting quantitative trait loci (QTLs) in genomic data, with JaxQTL offering fast GPU-accelerated analysis and TensorQTL providing comprehensive cis- and trans-QTL mapping capabilities.

This notebook assumes familiarity with single-cell data processing and basic statistical genetics concepts. The `cellink` package provides convenient wrapper functions that handle data preparation and formatting for these external tools. JaxQTL currently is not available via `pip` or `conda`. Please follow the instructions [here](https://github.com/mancusolab/jaxqtl). To use TensorQTL you can install it via `pip install 'cellink[tensorqtl]'`. TensorQTL also requires `plink2`.

## Environment Setup

We begin by importing necessary libraries and defining key parameters for our analysis. The `cellink` package provides wrapper functions for JaxQTL and TensorQTL that automatically handle data formatting and preparation.

In [1]:
import numpy as np
import pandas as pd
import scanpy as sc

import cellink as cl
from cellink._core import DAnn, GAnn
from cellink.resources import get_onek1k
from cellink.tl.external import run_jaxqtl, run_tensorqtl

# Analysis parameters
n_gpcs = 20
n_epcs = 15
batch_e_pcs_n_top_genes = 2000
chrom = 22
cis_window = 500_000
cell_type = "CD8 Naive"
celltype_key = "predicted.celltype.l2"
original_donor_col = "donor_id"

## Load and Prepare Data

We load the OneK1K dataset, which contains both genotype and single-cell expression data. We also add gene annotations from Ensembl, which are essential for defining genomic positions and cis-windows.

In [2]:
# Load the dataset
dd = get_onek1k(config_path='../../src/cellink/resources/config/onek1k.yaml', verify_checksum=False)
print(f"Dataset shape: {dd.shape}")

# Add gene annotations from Ensembl
def _get_ensembl_gene_id_start_end_chr():
    from pybiomart import Server
    server = Server(host='http://www.ensembl.org')
    dataset = (server.marts['ENSEMBL_MART_ENSEMBL'].datasets['hsapiens_gene_ensembl'])
    ensembl_gene_id_start_end_chr = dataset.query(
        attributes=['ensembl_gene_id', 'start_position', 'end_position', 'chromosome_name']
    )
    ensembl_gene_id_start_end_chr = ensembl_gene_id_start_end_chr.set_index("Gene stable ID")
    ensembl_gene_id_start_end_chr = ensembl_gene_id_start_end_chr.rename(columns={
        "Gene start (bp)": GAnn.start,
        "Gene end (bp)": GAnn.end,
        "Chromosome/scaffold name": GAnn.chrom,
    })
    return ensembl_gene_id_start_end_chr

ensembl_gene_id_start_end_chr = _get_ensembl_gene_id_start_end_chr()
dd.C.var = dd.C.var.join(ensembl_gene_id_start_end_chr)

# Set up donor information
dd.C.obs[DAnn.donor] = dd.C.obs[original_donor_col]
dd.G.obsm["gPCs"] = dd.G.obsm["gPCs"][dd.G.obsm["gPCs"].columns[:n_gpcs]]

INFO:root:/Users/larnoldt/cellink_data/onek1k/onek1k_cellxgene.h5ad already exists
INFO:root:/Users/larnoldt/cellink_data/onek1k/OneK1K.noGP.vcf.gz already exists
INFO:root:/Users/larnoldt/cellink_data/onek1k/OneK1K.noGP.vcf.gz.csi already exists
INFO:root:/Users/larnoldt/cellink_data/onek1k/gene_counts_Ensembl_105_phenotype_metadata.tsv.gz already exists
  return self.values.astype(_dtype_obj)


Dataset shape: (981, 10595884, 1248980, 36469)



## Data Preprocessing

We filter the dataset to focus on a specific cell type and prepare the data for eQTL analysis.


In [3]:
dd.aggregate(obs=["donor_id", "sex", "age"], func="first", add_to_obs=True)

In [4]:
# Filter to specific cell type
dd = dd[..., dd.C.obs[celltype_key] == cell_type, :].copy()
print(f"After cell type filtering: {dd.shape}")

# Add donor-level metadata
dd.G.obs["donor_sex"] = dd.G.obs["sex"] 
dd.G.obs["donor_age"] = dd.G.obs["age"]

# Generate random labels for demonstration (replace with real phenotypes)
dd.G.obs["donor_labels"] = np.random.randint(2, size=len(dd.G.obs))

# Filter to specific chromosome for faster analysis
dd = dd.sel(G_var=dd.G.var.chrom == str(chrom), C_var=dd.C.var.chrom == str(chrom)).copy()
print(f"After chromosome {chrom} filtering: {dd.shape}")

After cell type filtering: (980, 10595884, 52538, 36469)
After chromosome 22 filtering: (980, 136776, 52538, 880)


To speed up the computation we also filter for the number of SNPs.

In [5]:
dd = dd[:, dd.G.var["pos"] < 16584955, :, :].copy()

## JaxQTL Analysis

JaxQTL is a fast, GPU-accelerated tool for QTL mapping. It supports various statistical models and can handle large-scale genomic data efficiently. JaxQTL can be run in various modes: `["nominal", "cis", "cis_acat", "fitnull", "covar", "trans", "estimate_ld_only"]`. We are going to demonstrate the functionality exemplary for the `cis`, `cis_acat` and `trans` mode.

### Running JaxQTL with Default Parameters

In [6]:
# Basic JaxQTL analysis
results_jaxqtl_basic = run_jaxqtl(
    dd,
    prefix="jaxqtl_basic",
    mode="cis",
    model="NB",  # Negative binomial model
    window=cis_window,
    additional_covariates=["gPCs"],
    run=True
)

INFO:cellink._core.donordata:Aggregated X to PB
INFO:cellink._core.donordata:Observation found for 980 donors.
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 10.83it/s]

Writing FAM... done.
Writing BIM... done.



top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```

2025-09-24 20:12:54 | [INFO] Finished loading raw data.
INFO:2025-09-24 20:12:54,996:jax._src.xla_bridge:752: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: dlopen(libtpu.so, 0x0001): tried: 'libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibtpu.so' (no such file), '/opt/miniconda3/envs/tensorqtl/bin/../lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache), 'libtpu.so' (no such file)

In [7]:
results_jaxqtl_basic

Unnamed: 0,phenotype_id,chrom,num_var,variant_id,pos,tss_distance,ma_count,af,beta_shape1,beta_shape2,beta_converged,opt_status,true_nc,pval_nominal,slope,slope_se,pval_beta,alpha_cov,model_converged
0,ENSG00000198445,22,83,22_16579817_G_A,16579817,-10935.0,33.0,0.983163,470.337917,0.592327,1.0,True,1.94289e-16,0.999984,-0.014877,315.737477,0.938358,0.000766,1.0
1,ENSG00000288024,22,83,22_16584696_T_TA,16584696,-140377.0,343.0,0.825,79.592339,0.30503,1.0,True,1.94289e-16,,7.673026,,,,0.0
2,ENSG00000237689,22,83,22_16570885_T_G,16570885,-298549.0,533.0,0.728061,277.582342,0.471572,1.0,True,1.94289e-16,0.999707,-0.670713,197.744299,0.663357,4e-06,1.0
3,ENSG00000215568,22,59,22_16584696_T_TA,16584696,-377241.0,343.0,0.825,2.040014,0.556468,1.0,True,1.94289e-16,,61.349477,,,,0.0
4,ENSG00000273442,22,13,22_16584696_T_TA,16584696,-496006.0,343.0,0.825,389.741765,0.498558,1.0,True,1.94289e-16,,5.575901,,,,0.0


### Advanced JaxQTL Analysis with Custom Parameters

In [8]:
# Advanced JaxQTL analysis with more parameters
results_jaxqtl_advanced = run_jaxqtl(
    dd,
    prefix="jaxqtl_advanced",
    mode="cis_acat",  # ACAT-combined p-values
    model="gaussian",  # Gaussian model
    window=1000000,  # 1Mb window
    nperm=10000,  # Number of permutations
    test_method="wald",  # Wald test
    additional_covariates=["gPCs"],
    addpc=5,  # Number of genotype PCs to add
    standardize=True,
    verbose=True,
    run=True
)

INFO:cellink._core.donordata:Aggregated X to PB
INFO:cellink._core.donordata:Observation found for 980 donors.
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 21.96it/s]

Writing FAM... done.
Writing BIM... done.



top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```

2025-09-24 20:13:55 | [INFO] Finished loading raw data.
INFO:2025-09-24 20:13:55,850:jax._src.xla_bridge:752: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: dlopen(libtpu.so, 0x0001): tried: 'libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibtpu.so' (no such file), '/opt/miniconda3/envs/tensorqtl/bin/../lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache), 'libtpu.so' (no such file)

In [9]:
results_jaxqtl_advanced

Unnamed: 0,chrom,snp,pos,a0,a1,i,phenotype_id,tss,tss_distance,af,ma_count,pval_nominal,slope,slope_se,converged,alpha,pval_acat
0,22,22_16579817_G_A,16579817,G,A,64,ENSG00000069998,17137512.0,-557695.0,0.983163,33.0,0.02431077,-0.024451,0.010839,True,0.0,0.5671563
1,22,22_16584189_T_A,16584189,T,A,80,ENSG00000093072,17178791.0,-594602.0,0.795918,400.0,0.003675108,-0.008623,0.002961,True,0.0,0.1674492
2,22,22_16584189_T_A,16584189,T,A,80,ENSG00000177663,17084955.0,-500766.0,0.795918,400.0,0.03673914,-0.007921,0.003787,True,0.0,0.4458623
3,22,22_16577636_A_T,16577636,A,T,56,ENSG00000182902,17563451.0,-985815.0,0.987245,25.0,0.0123955,-0.001598,0.000638,True,0.0,0.6746757
4,22,22_16575148_G_A,16575148,G,A,49,ENSG00000183307,17116298.0,-541150.0,0.32551,638.0,0.2564433,-6.4e-05,5.6e-05,True,0.0,0.9776282
5,22,22_16409561_G_A,16409561,G,A,20,ENSG00000185837,17159339.0,-749778.0,0.979592,40.0,0.1716546,0.001778,0.0013,True,0.0,0.9125839
6,22,22_16388891_G_A,16388891,G,A,0,ENSG00000198445,16590752.0,-201861.0,0.939796,118.0,0.1052398,2.8e-05,1.7e-05,True,0.0,0.9182592
7,22,22_16524992_C_T,16524992,C,T,24,ENSG00000206195,15784887.0,740105.0,0.987755,24.0,6.700489e-09,-0.045234,0.00773,True,0.0,5.561415e-07
8,22,22_16569887_T_A,16569887,T,A,34,ENSG00000215568,16961937.0,-392050.0,0.67551,636.0,0.2790505,7.8e-05,7.2e-05,True,0.0,0.7000951
9,22,22_16570885_T_G,16570885,T,G,36,ENSG00000235478,17119520.0,-548635.0,0.728061,533.0,0.2911097,-2.1e-05,2e-05,True,0.0,0.7779322


### JaxQTL Trans-QTL Analysis

In [10]:
# Trans-QTL analysis with JaxQTL
results_jaxqtl_trans = run_jaxqtl(
    dd,
    prefix="jaxqtl_trans",
    mode="trans",
    model="gaussian",
    additional_covariates=["gPCs"],
    perm_seed=42,  # For reproducibility
    run=True
)

INFO:cellink._core.donordata:Aggregated X to PB
INFO:cellink._core.donordata:Observation found for 980 donors.
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 18.35it/s]

Writing FAM... done.
Writing BIM... done.



top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```

2025-09-24 20:14:10 | [INFO] Finished loading raw data.
INFO:2025-09-24 20:14:10,275:jax._src.xla_bridge:752: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: dlopen(libtpu.so, 0x0001): tried: 'libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibtpu.so' (no such file), '/opt/miniconda3/envs/tensorqtl/bin/../lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache), 'libtpu.so' (no such file)

In [11]:
results_jaxqtl_trans

Unnamed: 0,chrom,snp,pos,a0,a1,i,phenotype_id,tss,tss_distance,af,ma_count,pval_nominal,slope,slope_se,converged,alpha
0,22,22_16388891_G_A,16388891,G,A,0,ENSG00000280341,15282558.0,1106333.0,0.939796,118.0,0.105065,-0.000352,0.000217,1.0,0.0
1,22,22_16388968_C_T,16388968,C,T,1,ENSG00000280341,15282558.0,1106410.0,0.939796,118.0,0.105065,-0.000352,0.000217,1.0,0.0
2,22,22_16389525_A_G,16389525,A,G,2,ENSG00000280341,15282558.0,1106967.0,0.939796,118.0,0.105065,-0.000352,0.000217,1.0,0.0
3,22,22_16390411_G_A,16390411,G,A,3,ENSG00000280341,15282558.0,1107853.0,0.939796,118.0,0.105065,-0.000352,0.000217,1.0,0.0
4,22,22_16391555_G_C,16391555,G,C,4,ENSG00000280341,15282558.0,1108997.0,0.939796,118.0,0.105065,-0.000352,0.000217,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52617,22,22_16583746_T_TA,16583746,T,TA,78,ENSG00000079974,50767502.0,-34183756.0,0.802551,387.0,0.156083,-0.005625,0.003966,1.0,0.0
52618,22,22_16583995_C_T,16583995,C,T,79,ENSG00000079974,50767502.0,-34183507.0,0.894898,206.0,0.121485,-0.008289,0.005353,1.0,0.0
52619,22,22_16584189_T_A,16584189,T,A,80,ENSG00000079974,50767502.0,-34183313.0,0.795918,400.0,0.338743,-0.003690,0.003858,1.0,0.0
52620,22,22_16584657_C_CTCTA,16584657,C,CTCTA,81,ENSG00000079974,50767502.0,-34182845.0,0.894898,206.0,0.121485,-0.008289,0.005353,1.0,0.0


## TensorQTL Analysis

TensorQTL provides comprehensive QTL mapping capabilities with support for various analysis modes and statistical approaches. TensorQTL can be run in various modes: `["cis_nominal", "cis_independent", "cis", "trans", "cis_susie", "trans_susie"]`. We are going to demonstrate the functionality exemplary for the `cis`, `cis_nominal` and `trans` mode.

### Basic Cis-QTL Analysis with TensorQTL

In [6]:
# Basic cis-QTL mapping
results_tensorqtl_cis = run_tensorqtl(
    dd,
    prefix="tensorqtl_cis",
    mode="cis",
    window=cis_window,
    additional_covariates=["gPCs"],
    permutations=10000,
    run=True
)

INFO:cellink._core.donordata:Aggregated X to PB
INFO:cellink._core.donordata:Observation found for 980 donors.
INFO:cellink.tl.external._tensorqtl:Performing z-normalization of age.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  variant_df["index"] = range(len(variant_df))
  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 12.75it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_cis.log.
Options in effect:
  --bfile tensorqtl_cis
  --make-pgen
  --out tensorqtl_cis

Start time: Wed Sep 24 20:08:19 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
980 samples (0 females, 416 males, 564 ambiguous; 980 founders) loaded from
tensorqtl_cis.fam.
83 variants loaded from tensorqtl_cis.bim.
Note: No phenotype data present.
Writing tensorqtl_cis.psam ... done.
Writing tensorqtl_cis.pvar ... 1012131415161819202122242526272830313233343637383940424344454648495051535455565759606162636566676869717273747577787980818384858687899091929395969798done.
Writing tensorqtl_cis.pgen ... done.
End time: Wed Sep 24 20:08:19 2025
[Sep 24 20:08:23] Running TensorQTL v1.0.10: cis-QTL mapping
  * reading phenotypes (tensorqtl_cis_phenotype.bed.gz)
  * cis-window det

  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))


  * loading genotypes
cis-QTL mapping: empirical p-values for phenotypes
  * 980 samples
  * 880 phenotypes
  * 23 covariates
  * 83 variants
  * cis-window: ±500,000
    ** dropping 246 constant phenotypes
  * checking phenotypes: 634/634
    ** dropping 629 phenotypes without variants in cis-window
  * computing permutations
    processing phenotype 5/5
  Time elapsed: 0.01 min
done.
  * writing output
[Sep 24 20:08:24] Finished mapping


In [7]:
results_tensorqtl_cis

Unnamed: 0,phenotype_id,num_var,beta_shape1,beta_shape2,true_df,pval_true_df,variant_id,start_distance,end_distance,ma_samples,ma_count,af,pval_nominal,slope,slope_se,pval_perm,pval_beta
0,ENSG00000198445,83,0.506245,1.70642,361.03,0.299517,22_16388891_G_A,-201861,-203919,118,118,0.939796,0.091396,2.9e-05,1.7e-05,0.531347,0.692493
1,ENSG00000288024,83,0.484266,1.76579,394.72,0.470672,22_16584189_T_A,-140884,-163572,353,400,0.795918,0.26165,-3.5e-05,3.1e-05,0.905309,0.844779
2,ENSG00000237689,83,0.473477,1.7242,393.566,0.429277,22_16569887_T_A,-299547,-301394,523,636,0.67551,0.218047,2.5e-05,2e-05,0.819918,0.81663
3,ENSG00000215568,59,0.313608,1.99266,835.181,0.270351,22_16569887_T_A,-392050,-438335,523,636,0.67551,0.238511,8.5e-05,7.2e-05,0.768123,0.81461
4,ENSG00000273442,13,0.336019,2.4171,1342.66,0.39194,22_16583120_A_G,-497582,-498336,359,414,0.788776,0.47032,7e-06,1e-05,0.958804,0.91032


### Nominal Cis-QTL Analysis

In [8]:
# Nominal cis-QTL analysis (all variant-gene pairs)
results_tensorqtl_nominal = run_tensorqtl(
    dd,
    prefix="tensorqtl_nominal",
    mode="cis_nominal",
    window=cis_window,
    pval_threshold=1e-5,
    additional_covariates=["gPCs"],
    batch_size=20000,
    run=True
)

# Results contain multiple outputs for nominal mode
cis_qtl_pairs, cis_qtl_signif_pairs, cis_qtl_top_assoc = results_tensorqtl_nominal

INFO:cellink._core.donordata:Aggregated X to PB
INFO:cellink._core.donordata:Observation found for 980 donors.
INFO:cellink.tl.external._tensorqtl:Performing z-normalization of age.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  variant_df["index"] = range(len(variant_df))
  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 15.71it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_nominal.log.
Options in effect:
  --bfile tensorqtl_nominal
  --make-pgen
  --out tensorqtl_nominal

Start time: Wed Sep 24 20:08:25 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
980 samples (0 females, 416 males, 564 ambiguous; 980 founders) loaded from
tensorqtl_nominal.fam.
83 variants loaded from tensorqtl_nominal.bim.
Note: No phenotype data present.
Writing tensorqtl_nominal.psam ... done.
Writing tensorqtl_nominal.pvar ... 1012131415161819202122242526272830313233343637383940424344454648495051535455565759606162636566676869717273747577787980818384858687899091929395969798done.
Writing tensorqtl_nominal.pgen ... done.
End time: Wed Sep 24 20:08:25 2025
[Sep 24 20:08:28] Running TensorQTL v1.0.10: cis-QTL mapping
  * reading phenotypes (tensorqtl_nominal_

  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))


  * checking phenotypes: 634/634
    ** dropping 629 phenotypes without variants in cis-window
  * Computing associations
    Mapping chromosome 22
    processing phenotype 5/5
    time elapsed: 0.00 min
    * writing output
done.
[Sep 24 20:08:28] Finished mapping


In [9]:
cis_qtl_pairs

Unnamed: 0,phenotype_id,variant_id,start_distance,end_distance,af,ma_samples,ma_count,pval_nominal,slope,slope_se
0,ENSG00000198445,22_16388891_G_A,-201861,-203919,0.939796,118,118,0.091396,0.000029,0.000017
1,ENSG00000198445,22_16388968_C_T,-201784,-203842,0.939796,118,118,0.091396,0.000029,0.000017
2,ENSG00000198445,22_16389525_A_G,-201227,-203285,0.939796,118,118,0.091396,0.000029,0.000017
3,ENSG00000198445,22_16390411_G_A,-200341,-202399,0.939796,118,118,0.091396,0.000029,0.000017
4,ENSG00000198445,22_16391555_G_C,-199197,-201255,0.939796,118,118,0.091396,0.000029,0.000017
...,...,...,...,...,...,...,...,...,...,...
316,ENSG00000273442,22_16583746_T_TA,-496956,-497710,0.802551,347,387,0.471734,0.000007,0.000010
317,ENSG00000273442,22_16583995_C_T,-496707,-497461,0.894898,201,206,0.571142,0.000008,0.000014
318,ENSG00000273442,22_16584189_T_A,-496513,-497267,0.795918,353,400,0.478926,0.000007,0.000010
319,ENSG00000273442,22_16584657_C_CTCTA,-496045,-496799,0.894898,201,206,0.571142,0.000008,0.000014


### Trans-QTL Analysis with TensorQTL

In [10]:
# Trans-QTL mapping
results_tensorqtl_trans = run_tensorqtl(
    dd,
    prefix="tensorqtl_trans",
    mode="trans",
    pval_threshold=1e-6,  # Stricter threshold for trans
    additional_covariates=["gPCs"],
    batch_size=10000,  # Smaller batches for trans analysis
    return_r2=True,  # Include R² statistics
    run=True
)

INFO:cellink._core.donordata:Aggregated X to PB
INFO:cellink._core.donordata:Observation found for 980 donors.
INFO:cellink.tl.external._tensorqtl:Performing z-normalization of age.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  variant_df["index"] = range(len(variant_df))
  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 16.43it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_trans.log.
Options in effect:
  --bfile tensorqtl_trans
  --make-pgen
  --out tensorqtl_trans

Start time: Wed Sep 24 20:08:29 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
980 samples (0 females, 416 males, 564 ambiguous; 980 founders) loaded from
tensorqtl_trans.fam.
83 variants loaded from tensorqtl_trans.bim.
Note: No phenotype data present.
Writing tensorqtl_trans.psam ... done.
Writing tensorqtl_trans.pvar ... 1012131415161819202122242526272830313233343637383940424344454648495051535455565759606162636566676869717273747577787980818384858687899091929395969798done.
Writing tensorqtl_trans.pgen ... done.
End time: Wed Sep 24 20:08:29 2025
[Sep 24 20:08:31] Running TensorQTL v1.0.10: trans-QTL mapping
  * reading phenotypes (tensorqtl_trans_phenotype.bed.gz

  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))


[Sep 24 20:08:32] Finished mapping


In [11]:
results_tensorqtl_trans

Unnamed: 0,variant_id,phenotype_id,pval,b,b_se,r2,af
0,22_16393930_G_T,ENSG00000272689,4.686541e-07,-0.002914,0.000574,0.026248,0.977041
1,22_16393930_G_T,ENSG00000100350,8.754948e-07,-0.010521,0.002125,0.025019,0.977041
2,22_16393939_G_C,ENSG00000272689,4.686541e-07,-0.002914,0.000574,0.026248,0.977041
3,22_16393939_G_C,ENSG00000100350,8.754948e-07,-0.010521,0.002125,0.025019,0.977041
4,22_16409410_G_A,ENSG00000211660,1.123725e-07,-0.004519,0.000845,0.029059,0.975
5,22_16409410_G_A,ENSG00000211670,6.375461e-08,-0.000886,0.000163,0.030175,0.975
6,22_16409410_G_A,ENSG00000273350,1.297329e-07,-0.000791,0.000149,0.028777,0.975
7,22_16409561_G_A,ENSG00000211642,8.098972e-07,-0.000185,3.7e-05,0.025172,0.979592
8,22_16409561_G_A,ENSG00000211670,1.45498e-09,-0.001091,0.000179,0.037612,0.979592
9,22_16409561_G_A,ENSG00000273350,2.712563e-09,-0.000981,0.000163,0.036388,0.979592


## Advanced Usage: Dry Run and Command Generation

Both JaxQTL and TransQTL support generating commands without execution, which is useful for debugging or running on compute clusters, which si controlled by the argument `run`.

In [12]:
# Generate JaxQTL command without running
jaxqtl_command = run_jaxqtl(
    dd,
    prefix="jaxqtl_cluster",
    mode="cis",
    additional_covariates=["gPCs"],
    run=False  # Don't execute, just return command
)

print("JaxQTL command:")
print(jaxqtl_command)

INFO:cellink._core.donordata:Aggregated X to PB
INFO:cellink._core.donordata:Observation found for 980 donors.
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 19.59it/s]

Writing FAM... done.
Writing BIM... done.
JaxQTL command:
jaxqtl --geno jaxqtl_cluster --covar jaxqtl_cluster_donor_features.tsv --pheno jaxqtl_cluster_phenotype.bed.gz --model NB --mode cis --test-method score --window 500000 --nperm 1000 --addpc 2 --out jaxqtl_cluster --standardize





In [12]:
# Generate TensorQTL command without running  
tensorqtl_command = run_tensorqtl(
    dd,
    prefix="tensorqtl_cluster",
    mode="cis",
    additional_covariates=["gPCs"],
    run=False  # Don't execute, just return command
)

print("\nTensorQTL command:")
print(tensorqtl_command)

INFO:cellink._core.donordata:Aggregated X to PB
INFO:cellink._core.donordata:Observation found for 980 donors.
INFO:cellink.tl.external._tensorqtl:Performing z-normalization of age.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  variant_df["index"] = range(len(variant_df))
  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 16.17it/s]

Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_cluster.log.
Options in effect:
  --bfile tensorqtl_cluster
  --make-pgen
  --out tensorqtl_cluster

Start time: Wed Sep 24 20:08:33 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
980 samples (0 females, 416 males, 564 ambiguous; 980 founders) loaded from
tensorqtl_cluster.fam.
83 variants loaded from tensorqtl_cluster.bim.
Note: No phenotype data present.
Writing tensorqtl_cluster.psam ... done.
Writing tensorqtl_cluster.pvar ... 1012131415161819202122242526272830313233343637383940424344454648495051535455565759606162636566676869717273747577787980818384858687899091929395969798done.
Writing tensorqtl_cluster.pgen ... done.
End time: Wed Sep 24 20:08:33 2025

TensorQTL command:
python -m tensorqtl tensorqtl_cluster ten


