# Tutorial: eQTL Analysis with JaxQTL and TensorQTL using `cellink`

This tutorial demonstrates how to perform eQTL analysis using external tools JaxQTL and TensorQTL through the `cellink` package. The `cellink` package provides a unified interface to these powerful QTL mapping tools, making it easier to perform comprehensive genetic analyses on single-cell datasets. These tools provide powerful statistical methods for detecting quantitative trait loci (QTLs) in genomic data, with JaxQTL offering fast GPU-accelerated analysis and TensorQTL providing comprehensive cis- and trans-QTL mapping capabilities.

This notebook assumes familiarity with single-cell data processing and basic statistical genetics concepts. The `cellink` package provides convenient wrapper functions that handle data preparation and formatting for these external tools. JaxQTL currently is not available via `pip` or `conda`. Please follow the instructions [here](https://github.com/mancusolab/jaxqtl). To use TensorQTL you can install it via `pip install 'cellink[tensorqtl]'`. TensorQTL also requires `plink2`. For visualization of QTL calling results, please consider checking out the [Tutorial: Pseudobulk eQTL Analysis with `cellink`](https://cellink-docs.readthedocs.io/en/latest/tutorials/pseudobulk_eqtl.html).

## Environment Setup

We begin by importing necessary libraries and defining key parameters for our analysis. The `cellink` package provides wrapper functions for JaxQTL and TensorQTL that automatically handle data formatting and preparation.

In [1]:
import numpy as np

from cellink.resources import get_dummy_onek1k
from cellink.tl.external import run_jaxqtl, run_tensorqtl

# Analysis parameters
n_gpcs = 20
n_epcs = 15
batch_e_pcs_n_top_genes = 2000
chrom = 22
cis_window = 500_000
cell_type = "CD8 Naive"
celltype_key = "predicted.celltype.l2"
original_donor_col = "donor_id"

  from pkg_resources import DistributionNotFound, get_distribution  # type: ignore[import]
  from .autonotebook import tqdm as notebook_tqdm


## Load and Prepare Data

We load the OneK1K dataset, which contains both genotype and single-cell expression data. We also add gene annotations from Ensembl, which are essential for defining genomic positions and cis-windows. (This is a subset of the full OneK1K dataset, which can be downloaded, and prepared using `get_onek1k()`)

In [2]:
# Load the dataset
dd = get_dummy_onek1k(config_path="../../src/cellink/resources/config/dummy_onek1k.yaml", verify_checksum=False)
print(f"Dataset shape: {dd.shape}")

dd.G.obsm["gPCs"] = dd.G.obsm["gPCs"][dd.G.obsm["gPCs"].columns[:n_gpcs]]

[2025-12-29 03:06:35,629] INFO:root: /Users/larnoldt/cellink_data/dummy_onek1k/dummy_onek1k.dd.h5 already exists
[2025-12-29 03:06:36,994] INFO:root: Loaded dummy OneK1K dataset: (100, 146939, 125366, 34073)
Dataset shape: (100, 146939, 125366, 34073)



## Data Preprocessing

We filter the dataset to focus on a specific cell type and prepare the data for eQTL analysis.


In [3]:
dd.aggregate(obs=["donor_id", "sex", "age"], func="first", add_to_obs=True)

In [4]:
# Filter to specific cell type
dd = dd[..., dd.C.obs[celltype_key] == cell_type, :].copy()
print(f"After cell type filtering: {dd.shape}")

# Add donor-level metadata
dd.G.obs["donor_sex"] = dd.G.obs["sex"]
dd.G.obs["donor_age"] = dd.G.obs["age"]

# Generate random labels for demonstration (replace with real phenotypes)
dd.G.obs["donor_labels"] = np.random.randint(2, size=len(dd.G.obs))

# Filter to specific chromosome for faster analysis
dd = dd.sel(G_var=dd.G.var.chrom == str(chrom), C_var=dd.C.var.chrom == str(chrom)).copy()
print(f"After chromosome {chrom} filtering: {dd.shape}")

After cell type filtering: (100, 146939, 4756, 34073)
After chromosome 22 filtering: (100, 136776, 4756, 871)


To speed up the computation we also filter for the number of SNPs.

In [5]:
dd = dd[:, dd.G.var["pos"] < 17584955, :, :].copy()

## JaxQTL Analysis

JaxQTL is a fast, GPU-accelerated tool for QTL mapping. It supports various statistical models and can handle large-scale genomic data efficiently. JaxQTL can be run in various modes: `["nominal", "cis", "cis_acat", "fitnull", "covar", "trans", "estimate_ld_only"]`. 

### Running JaxQTL with Default Parameters
The basic `cis` mode performs permutation-based cis-QTL mapping with empirical FDR estimation. This is the standard approach for identifying cis-eQTLs with appropriate multiple testing correction.

In [6]:
# Basic JaxQTL analysis
results_jaxqtl_basic = run_jaxqtl(
    dd,
    prefix="jaxqtl_basic",
    mode="cis",
    model="NB",  # Negative binomial model
    window=cis_window,
    additional_covariates=["gPCs"],
    run=True,
)
results_jaxqtl_basic

[2025-12-29 03:02:45,533] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:02:45,533] INFO:cellink._core.donordata: Observation found for 100 donors.


Writing BED: 100%|██████████| 1/1 [00:00<00:00, 180.51it/s]

Writing FAM... done.
Writing BIM... done.



top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```

2025-12-29 03:02:49 | [INFO] Finished loading raw data.
INFO:2025-12-29 03:02:49,638:jax._src.xla_bridge:752: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: dlopen(libtpu.so, 0x0001): tried: 'libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibtpu.so' (no such file), '/opt/miniconda3/envs/tensorqtl/bin/../lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache), 'libtpu.so' (no such file)

Unnamed: 0,phenotype_id,chrom,num_var,variant_id,pos,tss_distance,ma_count,af,beta_shape1,beta_shape2,beta_converged,opt_status,true_nc,pval_nominal,slope,slope_se,pval_beta,alpha_cov,model_converged
0,ENSG00000177663,22,4161,22_17025474_T_C,17025474,-59481.0,35.0,0.175,52.109502,54.426431,1.0,True,1.94289e-16,0.482593,-0.744504,1.098446,0.447015,1.871933e-07,1.0
1,ENSG00000069998,22,4114,22_17212956_A_ATTTT,17212956,75444.0,5.0,0.975,21.068593,33.666325,1.0,True,1.94289e-16,0.42272,-1.353229,1.771991,0.721527,5.184935e-07,1.0
2,ENSG00000185837,22,4096,22_17035794_T_G,17035794,-123545.0,14.0,0.93,,,0.0,True,3.267188,0.99975,-23.416329,348.036692,,1.780708e-06,1.0
3,ENSG00000093072,22,4081,22_16933780_G_A,16933780,-245011.0,1.0,0.995,21.659233,31.581654,1.0,True,1.94289e-16,0.391825,-2.492299,3.536479,0.418729,4.305565e-08,1.0
4,ENSG00000182902,22,2647,22_17577481_A_C,17577481,14030.0,1.0,0.995,2179.394106,3.329034,1.0,True,1.94289e-16,0.998652,-7.683877,333.268516,0.514215,0.002774556,1.0
5,ENSG00000236754,22,2608,22_17239813_C_T,17239813,-339574.0,7.0,0.965,1512.785525,2.894406,1.0,True,1.94289e-16,0.998596,-3.354731,201.541181,0.616794,1.314982e-05,1.0
6,ENSG00000131100,22,2563,22_17563390_G_C,17563390,-28747.0,69.0,0.345,20.036668,37.267971,1.0,True,1.94289e-16,0.282472,0.517867,0.489794,0.14164,9.33417e-05,1.0
7,ENSG00000099968,22,2407,22_17219920_C_T,17219920,-408936.0,59.0,0.295,13.350913,24.180304,1.0,True,1.94289e-16,0.334455,0.638422,0.675893,0.405157,7.940555e-05,1.0
8,ENSG00000015475,22,1724,22_17407251_T_C,17407251,-326888.0,3.0,0.985,36.027044,41.509726,1.0,True,1.94289e-16,0.537514,-1.237648,2.082027,0.900735,8.893963e-05,1.0
9,ENSG00000269220,22,1525,22_17351364_TTC_T,17351364,-425954.0,2.0,0.99,17.517949,19.103963,1.0,True,1.94289e-16,0.408392,-2.148595,2.861632,0.199174,1e-08,1.0


### Advanced JaxQTL Analysis with Custom Parameters
The `cis_acat` mode uses the Aggregated Cauchy Association Test to combine p-values across multiple variants, providing a powerful approach for detecting associations when there are multiple causal variants in a region.

In [7]:
# Advanced JaxQTL analysis with more parameters
results_jaxqtl_advanced = run_jaxqtl(
    dd,
    prefix="jaxqtl_advanced",
    mode="cis_acat",  # ACAT-combined p-values
    model="gaussian",  # Gaussian model
    window=1000000,  # 1Mb window
    nperm=10000,  # Number of permutations
    test_method="wald",  # Wald test
    additional_covariates=["gPCs"],
    addpc=5,  # Number of genotype PCs to add
    standardize=True,
    verbose=True,
    run=True,
)
results_jaxqtl_advanced

[2025-12-29 03:03:39,828] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:03:39,829] INFO:cellink._core.donordata: Observation found for 100 donors.


Writing BED: 100%|██████████| 1/1 [00:00<00:00, 61.59it/s]

Writing FAM... done.
Writing BIM... done.



top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```

2025-12-29 03:03:44 | [INFO] Finished loading raw data.
INFO:2025-12-29 03:03:44,390:jax._src.xla_bridge:752: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: dlopen(libtpu.so, 0x0001): tried: 'libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibtpu.so' (no such file), '/opt/miniconda3/envs/tensorqtl/bin/../lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache), 'libtpu.so' (no such file)

Unnamed: 0,chrom,snp,pos,a0,a1,i,phenotype_id,tss,tss_distance,af,ma_count,pval_nominal,slope,slope_se,converged,alpha,pval_acat
0,22,22_17009000_T_C,17009000,T,C,1345,ENSG00000015475,17734139.0,-725139.0,0.92,16.0,0.002877574,-0.060722,0.019666,True,0.0,0.5803468
1,22,22_17524093_G_A,17524093,G,A,3994,ENSG00000069998,17137512.0,386581.0,0.985,3.0,0.0001970923,-0.144284,0.036742,True,0.0,0.5543843
2,22,22_17545374_C_CAAA,17545374,C,CAAA,4109,ENSG00000093072,17178791.0,366583.0,0.96,8.0,1.307348e-05,-0.075339,0.016078,True,0.0,0.05631898
3,22,22_16849850_C_CA,16849850,C,CA,734,ENSG00000099968,17628856.0,-779006.0,0.98,4.0,0.00013023,-0.146788,0.036267,True,0.0,0.03125808
4,22,22_17019092_T_A,17019092,T,A,1388,ENSG00000131100,17592137.0,-573045.0,0.995,1.0,7.546834e-05,-0.397131,0.094481,True,0.0,0.1132195
5,22,22_17349688_C_T,17349688,C,T,3007,ENSG00000177663,17084955.0,264733.0,0.955,9.0,0.0009499886,0.060594,0.017568,True,0.0,0.5397122
6,22,22_16702145_A_G,16702145,A,G,236,ENSG00000182902,17563451.0,-861306.0,0.98,4.0,4.909324e-09,-0.004212,0.000633,True,0.0,8.408108e-06
7,22,22_17547967_G_A,17547967,G,A,4129,ENSG00000183785,18110101.0,-562134.0,0.985,3.0,1.428733e-09,-0.036549,0.005256,True,0.0,3.574742e-06
8,22,22_17225888_G_A,17225888,G,A,2512,ENSG00000184979,18149844.0,-923956.0,0.995,1.0,1.143822e-14,-0.18768,0.019322,True,0.0,2.643713e-11
9,22,22_17031458_ATT_A,17031458,ATT,A,1450,ENSG00000185837,17159339.0,-127881.0,0.995,1.0,8.238284e-16,-0.062261,0.006021,True,0.0,3.682221e-12


### JaxQTL Trans-QTL Analysis
Trans-QTL analysis identifies associations between variants and genes on different chromosomes. These typically have smaller effect sizes than cis-QTLs and require larger sample sizes to detect.

In [8]:
# Trans-QTL analysis with JaxQTL
results_jaxqtl_trans = run_jaxqtl(
    dd, prefix="jaxqtl_trans", mode="trans", model="gaussian", additional_covariates=["gPCs"], run=True
)
results_jaxqtl_trans

[2025-12-29 03:04:00,048] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:04:00,048] INFO:cellink._core.donordata: Observation found for 100 donors.


Writing BED: 100%|██████████| 1/1 [00:00<00:00, 144.11it/s]

Writing FAM... done.
Writing BIM... done.



top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```

2025-12-29 03:04:04 | [INFO] Finished loading raw data.
INFO:2025-12-29 03:04:04,279:jax._src.xla_bridge:752: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: dlopen(libtpu.so, 0x0001): tried: 'libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibtpu.so' (no such file), '/opt/miniconda3/envs/tensorqtl/bin/../lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache), 'libtpu.so' (no such file)

Unnamed: 0,chrom,snp,pos,a0,a1,i,phenotype_id,tss,tss_distance,af,ma_count,pval_nominal,slope,slope_se,converged,alpha
0,22,22_16388891_G_A,16388891,G,A,0,ENSG00000177663,17084955.0,-696064.0,0.945,11.0,0.296205,-0.021325,0.020414,1.0,0.0
1,22,22_16388968_C_T,16388968,C,T,1,ENSG00000177663,17084955.0,-695987.0,0.945,11.0,0.296205,-0.021325,0.020414,1.0,0.0
2,22,22_16389525_A_G,16389525,A,G,2,ENSG00000177663,17084955.0,-695430.0,0.945,11.0,0.296205,-0.021325,0.020414,1.0,0.0
3,22,22_16390411_G_A,16390411,G,A,3,ENSG00000177663,17084955.0,-694544.0,0.945,11.0,0.296205,-0.021325,0.020414,1.0,0.0
4,22,22_16391555_G_C,16391555,G,C,4,ENSG00000177663,17084955.0,-693400.0,0.945,11.0,0.296205,-0.021325,0.020414,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2105019,22,22_17583056_C_CA,17583056,C,CA,4272,ENSG00000079974,50767502.0,-33184446.0,0.830,34.0,0.504663,0.009504,0.014245,1.0,0.0
2105020,22,22_17583078_A_G,17583078,A,G,4273,ENSG00000079974,50767502.0,-33184424.0,0.830,34.0,0.504663,0.009504,0.014245,1.0,0.0
2105021,22,22_17584465_GAGAGAGAA_G,17584465,GAGAGAGAA,G,4274,ENSG00000079974,50767502.0,-33183037.0,0.995,1.0,0.761113,-0.022471,0.073914,1.0,0.0
2105022,22,22_17584467_GAGAGAA_G,17584467,GAGAGAA,G,4275,ENSG00000079974,50767502.0,-33183035.0,0.970,6.0,0.649968,-0.014370,0.031666,1.0,0.0


## TensorQTL Analysis

TensorQTL provides comprehensive QTL mapping capabilities with support for various analysis modes and statistical approaches. TensorQTL can be run in various modes: `["cis_nominal", "cis_independent", "cis", "trans", "cis_susie", "trans_susie"]`. 

### Basic Cis-QTL Analysis with TensorQTL

The basic `cis` mode in TensorQTL performs permutation-based cis-QTL mapping, similar to JaxQTL but with TensorFlow-based GPU acceleration.

In [9]:
# Basic cis-QTL mapping
results_tensorqtl_cis = run_tensorqtl(
    dd,
    prefix="tensorqtl_cis",
    mode="cis",
    window=cis_window,
    additional_covariates=["gPCs"],
    permutations=10000,
    run=True,
)
results_tensorqtl_cis

[2025-12-29 03:06:59,460] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:06:59,461] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:06:59,715] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.


  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 142.76it/s]

Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_cis.log.
Options in effect:
  --bfile tensorqtl_cis
  --make-pgen
  --out tensorqtl_cis

Start time: Mon Dec 29 03:06:59 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cis.fam.
4277 variants loaded from tensorqtl_cis.bim.
Note: No phenotype data present.
Writing tensorqtl_cis.psam ... done.
Writing tensorqtl_cis.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898




[Dec 29 03:07:06] Running TensorQTL v1.0.10: cis-QTL mapping
  * reading phenotypes (tensorqtl_cis_phenotype.bed.gz)
  * cis-window detected as [start - 500,000, end + 500,000]
  * reading covariates (tensorqtl_cis_donor_features.tsv)
  * loading genotypes
cis-QTL mapping: empirical p-values for phenotypes
  * 100 samples
  * 871 phenotypes
  * 23 covariates
  * 4277 variants
  * cis-window: ±500,000
    ** dropping 375 constant phenotypes
  * checking phenotypes: 496/496
    ** dropping 480 phenotypes without variants in cis-window
  * computing permutations


  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))




  slope_se = np.abs(slope) / np.sqrt(tstat2)
  return get_t_pval(np.sqrt(tstat2), dof, log=logp)
  tstat2 = dof * r2 / (1 - r2)




  return (1.0-shape1)*np.sum(np.log(x)) + (1.0-shape2)*np.sum(np.log(1.0-x)) + len(x)*logbeta
  tstat2 = dof * r2 / (1 - r2)
  return get_t_pval(np.sqrt(tstat2), dof, log=logp)




  return (1.0-shape1)*np.sum(np.log(x)) + (1.0-shape2)*np.sum(np.log(1.0-x)) + len(x)*logbeta


    processing phenotype 16/16


  tstat2 = dof * r2 / (1 - r2)
  return (1.0-shape1)*np.sum(np.log(x)) + (1.0-shape2)*np.sum(np.log(1.0-x)) + len(x)*logbeta
  np.max(np.abs(fsim[0] - fsim[1:])) <= fatol):


  Time elapsed: 0.04 min
done.
  * writing output
Computing q-values
  * Number of phenotypes tested: 16
  * Correlation between Beta-approximated and empirical p-values: nan
  * Proportion of significant phenotypes (1-pi0): 0.00
  * QTL phenotypes @ FDR 0.05: 0
[Dec 29 03:07:09] Finished mapping


Unnamed: 0,phenotype_id,num_var,beta_shape1,beta_shape2,true_df,pval_true_df,variant_id,start_distance,end_distance,ma_samples,ma_count,af,pval_nominal,slope,slope_se,pval_perm,pval_beta,qval
0,ENSG00000177663,4161,0.99728,407.175,61.4251,0.000759284,22_17404016_TTGGGAGATG_T,319061,288323,71,87,0.565,0.000196523,0.031859,0.008135,0.273173,0.267237,0.7283
1,ENSG00000069998,4114,1.01566,138.48,38.9329,0.00358876,22_17524093_G_A,386581,358806,3,3,0.985,5.02932e-05,-0.174374,0.040528,0.382562,0.384575,0.769149
2,ENSG00000185837,4096,0.797147,12.4749,9.12607,0.00470983,22_17031458_ATT_A,-127881,-133993,1,1,0.995,1.20061e-16,-0.061286,0.005758,0.10489,0.108845,0.7283
3,ENSG00000093072,4081,0.926989,47.0226,25.1531,0.00447654,22_17560607_G_GCTCTCC,381816,302372,3,3,0.985,7.816e-07,-0.199937,0.037089,0.259374,0.220129,0.7283
4,ENSG00000182902,2647,,,75.0,,22_17577481_A_C,14030,-13514,1,1,0.995,,-0.012346,,0.052395,,
5,ENSG00000236754,2608,,,75.0,4.74117e-08,22_17239813_C_T,-339574,-349379,7,7,0.965,4.74117e-08,-0.001256,0.000207,0.787621,,
6,ENSG00000131100,2563,1.00307,119.937,43.1606,0.00874977,22_17360965_T_C,-231172,-267784,30,30,0.85,0.000531551,-0.08839,0.024414,0.652235,0.650188,0.962637
7,ENSG00000099968,2407,0.763298,18.4066,19.7395,0.0990824,22_17219920_C_T,-408936,-510935,49,59,0.295,0.00117595,0.039427,0.011686,0.909409,0.903363,0.962637
8,ENSG00000015475,1724,1.03867,146.927,55.1653,0.0226351,22_17345140_C_T,-388999,-429630,2,2,0.99,0.00778865,-0.137268,0.050196,0.961504,0.962637,0.962637
9,ENSG00000269220,1525,0.664014,9.55225,16.9234,0.00756568,22_17351364_TTC_T,-425954,-429155,2,2,0.99,1.30941e-08,-0.144193,0.022597,0.244576,0.186541,0.7283


### Nominal Cis-QTL Analysis
Nominal mode tests all variant-gene pairs within the specified window and returns comprehensive association statistics. This mode can also output top associations per gene and significant pairs based on a p-value threshold.

In [10]:
# Nominal cis-QTL analysis (all variant-gene pairs)
results_tensorqtl_nominal = run_tensorqtl(
    dd,
    prefix="tensorqtl_nominal",
    mode="cis_nominal",
    window=cis_window,
    pval_threshold=1e-5,
    additional_covariates=["gPCs"],
    batch_size=20000,
    run=True,
)

# Results contain multiple outputs for nominal mode
cis_qtl_pairs, cis_qtl_signif_pairs, cis_qtl_top_assoc = results_tensorqtl_nominal
print(f"Total variant-gene pairs tested: {len(cis_qtl_pairs)}")
print(f"Significant pairs (p < {1e-5}): {len(cis_qtl_signif_pairs) if cis_qtl_signif_pairs is not None else 0}")
cis_qtl_pairs

[2025-12-29 03:07:09,832] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:09,834] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:10,153] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.


  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 124.11it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_nominal.log.
Options in effect:
  --bfile tensorqtl_nominal
  --make-pgen
  --out tensorqtl_nominal

Start time: Mon Dec 29 03:07:10 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_nominal.fam.
4277 variants loaded from tensorqtl_nominal.bim.
Note: No phenotype data present.
Writing tensorqtl_nominal.psam ... done.
Writing tensorqtl_nominal.pvar ... 101011111212131314141515161617171818191920202121222223232424252526262727282829293030313132323333343435353636373738383939404041414242434344444545464647474848494950505151525253535454555556565757585859596060616162626363646465656666676768686969707071717272737374747575767677777878797980808181828283838484858586868787888889899090919192929

  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))


Total variant-gene pairs tested: 34114
Significant pairs (p < 1e-05): 0


Unnamed: 0,phenotype_id,variant_id,start_distance,end_distance,af,ma_samples,ma_count,pval_nominal,slope,slope_se
0,ENSG00000177663,22_16585130_T_C,-499825,-530563,0.980,4,4,0.339680,0.032829,0.034164
1,ENSG00000177663,22_16585144_G_A,-499811,-530549,0.905,17,19,0.402192,0.011152,0.013236
2,ENSG00000177663,22_16585510_C_G,-499445,-530183,0.500,71,100,0.109400,-0.012162,0.007507
3,ENSG00000177663,22_16585603_G_A,-499352,-530090,0.740,44,52,0.395744,0.007709,0.009025
4,ENSG00000177663,22_16585810_T_C,-499145,-529883,0.730,43,54,0.478267,0.006274,0.008803
...,...,...,...,...,...,...,...,...,...,...
34109,ENSG00000215193,22_17583056_C_CA,-494868,-522340,0.830,30,34,0.274086,0.013370,0.012135
34110,ENSG00000215193,22_17583078_A_G,-494846,-522318,0.830,30,34,0.274086,0.013370,0.012135
34111,ENSG00000215193,22_17584465_GAGAGAGAA_G,-493459,-520931,0.995,1,1,0.649537,-0.028478,0.062419
34112,ENSG00000215193,22_17584467_GAGAGAA_G,-493457,-520929,0.970,6,6,0.833054,0.005817,0.027500


### Cis-Independent QTL Analysis
The `cis_independent` mode identifies multiple independent association signals for each gene by performing conditional analysis. This is crucial for understanding the genetic architecture when multiple causal variants affect the same gene.

In [11]:
# First run basic cis mode to get top associations
results_tensorqtl_cis = run_tensorqtl(
    dd,
    prefix="tensorqtl_cis",
    mode="cis",
    window=cis_window,
    additional_covariates=["gPCs"],
    permutations=10000,
    fdr=0.9,  # Very permissive for demo
    run=True,
)

# Now identify independent signals conditioning on the top hits
results_tensorqtl_independent = run_tensorqtl(
    dd,
    prefix="tensorqtl_independent",
    mode="cis_independent",
    window=cis_window,
    cis_output="tensorqtl_cis.cis_qtl.txt.gz",  # Use output from cis mode
    additional_covariates=["gPCs"],
    fdr=0.9,  # Very permissive for demo
    pval_threshold=0.1,  # Very permissive for demo
    run=True,
)

print(f"Independent signals identified: {len(results_tensorqtl_independent)}")
results_tensorqtl_independent

[2025-12-29 03:07:14,748] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:14,749] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:14,916] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.


  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 124.89it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_cis.log.
Options in effect:
  --bfile tensorqtl_cis
  --make-pgen
  --out tensorqtl_cis

Start time: Mon Dec 29 03:07:15 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cis.fam.
4277 variants loaded from tensorqtl_cis.bim.
Note: No phenotype data present.
Writing tensorqtl_cis.psam ... done.
Writing tensorqtl_cis.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899don

  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))




  slope_se = np.abs(slope) / np.sqrt(tstat2)
  return get_t_pval(np.sqrt(tstat2), dof, log=logp)
  tstat2 = dof * r2 / (1 - r2)
  return (1.0-shape1)*np.sum(np.log(x)) + (1.0-shape2)*np.sum(np.log(1.0-x)) + len(x)*logbeta
  tstat2 = dof * r2 / (1 - r2)
  return get_t_pval(np.sqrt(tstat2), dof, log=logp)




  return (1.0-shape1)*np.sum(np.log(x)) + (1.0-shape2)*np.sum(np.log(1.0-x)) + len(x)*logbeta


    processing phenotype 16/16


  tstat2 = dof * r2 / (1 - r2)
  return (1.0-shape1)*np.sum(np.log(x)) + (1.0-shape2)*np.sum(np.log(1.0-x)) + len(x)*logbeta
  np.max(np.abs(fsim[0] - fsim[1:])) <= fatol):


  Time elapsed: 0.04 min
done.
  * writing output
Computing q-values
  * Number of phenotypes tested: 16
  * Correlation between Beta-approximated and empirical p-values: nan
  * Proportion of significant phenotypes (1-pi0): 0.00
  * QTL phenotypes @ FDR 0.90: 7
  * min p-value threshold @ FDR 0.9: 0.518721
[Dec 29 03:07:22] Finished mapping
[2025-12-29 03:07:22,502] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:22,502] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:22,668] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.


  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 125.85it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_independent.log.
Options in effect:
  --bfile tensorqtl_independent
  --make-pgen
  --out tensorqtl_independent

Start time: Mon Dec 29 03:07:22 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_independent.fam.
4277 variants loaded from tensorqtl_independent.bim.
Note: No phenotype data present.
Writing tensorqtl_independent.psam ... done.
Writing tensorqtl_independent.pvar ... 10101111121213131414151516161717181819192020212122222323242425252626272728282929303031313232333334343535363637373838393940404141424243434444454546464747484849495050515152525353545455555656575758585959606061616262636364646565666667676868696970707171727273737474757576767777787879798080818182828383848485858

  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))


  Time elapsed: 0.04 min
done.
  * writing output
[Dec 29 03:07:29] Finished mapping
Independent signals identified: 9


Unnamed: 0,phenotype_id,num_var,beta_shape1,beta_shape2,true_df,pval_true_df,variant_id,start_distance,end_distance,ma_samples,ma_count,af,pval_nominal,slope,slope_se,pval_perm,pval_beta,rank
0,ENSG00000177663,4194,1.01511,423.352,60.1167,4e-06,22_17404016_TTGGGAGATG_T,319061,288323,71,87,0.565,3.45295e-07,0.037838,0.006747,0.0018,0.001449,1
1,ENSG00000177663,4194,1.0036,440.105,60.746,6.2e-05,22_17546485_C_CA,461530,430792,2,2,0.99,1.12444e-05,-0.161322,0.034204,0.023898,0.026704,2
2,ENSG00000177663,4194,1.00804,429.604,60.4213,0.000533,22_17214059_A_G,129104,98366,27,28,0.86,0.000138995,0.039808,0.009897,0.20078,0.201398,3
3,ENSG00000069998,4114,1.03922,127.939,37.8991,0.004069,22_17524093_G_A,386581,358806,3,3,0.985,5.02932e-05,-0.174374,0.040528,0.390561,0.387522,1
4,ENSG00000185837,4096,0.77305,13.0242,9.41844,0.004061,22_17031458_ATT_A,-127881,-133993,1,1,0.995,1.20061e-16,-0.061286,0.005758,0.10459,0.108415,1
5,ENSG00000093072,4081,0.938737,45.7047,24.6602,0.004894,22_17560607_G_GCTCTCC,381816,302372,3,3,0.985,7.816e-07,-0.199937,0.037089,0.265473,0.22631,1
6,ENSG00000269220,1525,0.655935,9.81804,17.3017,0.006905,22_17351364_TTC_T,-425954,-429155,2,2,0.99,1.30941e-08,-0.144193,0.022597,0.237676,0.183322,1
7,ENSG00000286990,1477,0.522711,4.71656,8.67813,0.021041,22_17435684_T_TAA,-365039,-366351,3,4,0.98,3.72023e-12,-0.020749,0.00251,0.247375,0.319513,1
8,ENSG00000234913,393,0.473676,4.08297,14.3416,0.009296,22_17560607_G_GCTCTCC,-443664,-446701,3,3,0.985,1.65204e-09,-0.015233,0.002218,0.133487,0.230412,1


### Trans-QTL Analysis with TensorQTL
Trans-QTL mapping with TensorQTL tests associations between variants and distant genes. Due to the large number of tests, stricter p-value thresholds are typically required.

In [12]:
# Trans-QTL mapping
results_tensorqtl_trans = run_tensorqtl(
    dd,
    prefix="tensorqtl_trans",
    mode="trans",
    pval_threshold=1e-6,  # Stricter threshold for trans
    additional_covariates=["gPCs"],
    batch_size=10000,  # Smaller batches for trans analysis
    return_r2=True,  # Include R² statistics
    run=True,
)
results_tensorqtl_trans

[2025-12-29 03:07:29,508] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:29,508] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:29,677] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.


  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 146.61it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_trans.log.
Options in effect:
  --bfile tensorqtl_trans
  --make-pgen
  --out tensorqtl_trans

Start time: Mon Dec 29 03:07:29 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_trans.fam.
4277 variants loaded from tensorqtl_trans.bim.
Note: No phenotype data present.
Writing tensorqtl_trans.psam ... done.
Writing tensorqtl_trans.pvar ... 10101111121213131414151516161717181819192020212122222323242425252626272728282929303031313232333334343535363637373838393940404141424243434444454546464747484849495050515152525353545455555656575758585959606061616262636364646565666667676868696970707171727273737474757576767777787879798080818182828383848485858686878788888989909091919292939394949595969

  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))


Unnamed: 0,variant_id,phenotype_id,pval,b,b_se,r2,af
0,22_16393930_G_T,ENSG00000273350,2.124748e-07,-0.002581,0.000452,0.303167,0.980
1,22_16393939_G_C,ENSG00000273350,2.124748e-07,-0.002581,0.000452,0.303167,0.980
3,22_16414733_G_A,ENSG00000133460,6.571755e-08,-0.264812,0.044160,0.324078,0.985
4,22_16414733_G_A,ENSG00000099999,1.474357e-07,-0.062907,0.010844,0.309742,0.985
5,22_16414733_G_A,ENSG00000273076,1.771229e-07,-0.062787,0.010907,0.306448,0.985
...,...,...,...,...,...,...,...
2475,22_17582333_G_A,ENSG00000128165,7.672292e-07,-0.053877,0.009986,0.279612,0.995
2477,22_17582867_C_T,ENSG00000100156,1.227405e-09,-0.002649,0.000382,0.390730,0.985
2478,22_17582867_C_T,ENSG00000286381,2.763245e-07,-0.054595,0.009666,0.298404,0.985
2480,22_17584465_GAGAGAGAA_G,ENSG00000128165,7.672292e-07,-0.053877,0.009986,0.279612,0.995


### SuSiE Finemapping with TensorQTL
`SuSiE` (Sum of Single Effects) is a powerful finemapping method that identifies credible sets of causal variants for each independent association signal. It accounts for LD structure and provides posterior probabilities for each variant being causal.
#### Cis-SuSiE Finemapping
Cis-SuSiE performs finemapping on the top cis-QTL signals to identify credible sets of potentially causal variants. This requires the output from a previous cis-QTL run.

In [13]:
# First, run standard cis-QTL analysis if not already done
results_tensorqtl_cis = run_tensorqtl(
    dd,
    prefix="tensorqtl_cis_for_susie",
    mode="cis",
    window=cis_window,
    additional_covariates=["gPCs"],
    permutations=10000,
    run=True,
)

# Perform SuSiE finemapping on cis-QTL signals
results_tensorqtl_cis_susie = run_tensorqtl(
    dd,
    prefix="tensorqtl_cis_susie",
    mode="cis_susie",
    window=cis_window,
    cis_output="tensorqtl_cis_for_susie.cis_qtl.txt.gz",  # Input from cis mode
    additional_covariates=["gPCs"],
    max_effects=10,  # Maximum number of causal effects per gene
    fdr=0.9,  # Very permissive for demo, FDR threshold for selecting genes to finemapping
    run=True,
)

# Results contain SuSiE model objects and summary statistics
susie_models, susie_summary = results_tensorqtl_cis_susie

print(f"Genes with finemapping results: {len(susie_models)}")
print("\nSummary statistics columns:")
print(susie_summary.columns.tolist())
print(f"\nTotal credible sets identified: {len(susie_summary)}")

# Examine credible sets for a specific gene
if len(susie_summary) > 0:
    example_gene = susie_summary["phenotype_id"].iloc[0]
    gene_credible_sets = susie_summary[susie_summary["phenotype_id"] == example_gene]
    print(f"\nCredible sets for {example_gene}:")
    print(gene_credible_sets)

[2025-12-29 03:07:34,596] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:34,596] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:34,861] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.


  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 129.00it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_cis_for_susie.log.
Options in effect:
  --bfile tensorqtl_cis_for_susie
  --make-pgen
  --out tensorqtl_cis_for_susie

Start time: Mon Dec 29 03:07:34 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cis_for_susie.fam.
4277 variants loaded from tensorqtl_cis_for_susie.bim.
Note: No phenotype data present.
Writing tensorqtl_cis_for_susie.psam ... done.
Writing tensorqtl_cis_for_susie.pvar ... 101011111212131314141515161617171818191920202121222223232424252526262727282829293030313132323333343435353636373738383939404041414242434344444545464647474848494950505151525253535454555556565757585859596060616162626363646465656666676768686969707071717272737374747575767677777878797980808181828

  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))




  slope_se = np.abs(slope) / np.sqrt(tstat2)
  return get_t_pval(np.sqrt(tstat2), dof, log=logp)
  tstat2 = dof * r2 / (1 - r2)
  return (1.0-shape1)*np.sum(np.log(x)) + (1.0-shape2)*np.sum(np.log(1.0-x)) + len(x)*logbeta
  tstat2 = dof * r2 / (1 - r2)
  return get_t_pval(np.sqrt(tstat2), dof, log=logp)




  return (1.0-shape1)*np.sum(np.log(x)) + (1.0-shape2)*np.sum(np.log(1.0-x)) + len(x)*logbeta


    processing phenotype 16/16


  tstat2 = dof * r2 / (1 - r2)
  return (1.0-shape1)*np.sum(np.log(x)) + (1.0-shape2)*np.sum(np.log(1.0-x)) + len(x)*logbeta
  np.max(np.abs(fsim[0] - fsim[1:])) <= fatol):


  Time elapsed: 0.04 min
done.
  * writing output
Computing q-values
  * Number of phenotypes tested: 16
  * Correlation between Beta-approximated and empirical p-values: nan
  * Proportion of significant phenotypes (1-pi0): 0.00
  * QTL phenotypes @ FDR 0.05: 0
[Dec 29 03:07:41] Finished mapping
[2025-12-29 03:07:42,336] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:42,336] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:42,499] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.


  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 123.86it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_cis_susie.log.
Options in effect:
  --bfile tensorqtl_cis_susie
  --make-pgen
  --out tensorqtl_cis_susie

Start time: Mon Dec 29 03:07:42 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cis_susie.fam.
4277 variants loaded from tensorqtl_cis_susie.bim.
Note: No phenotype data present.
Writing tensorqtl_cis_susie.psam ... done.
Writing tensorqtl_cis_susie.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898

  pos_df.groupby('chr', sort=False, group_keys=False).apply(lambda x: x.sort_values(['start', 'end']))


  Time elapsed: 0.01 min
done.
[Dec 29 03:07:47] Finished mapping
Genes with finemapping results: 5

Summary statistics columns:
['phenotype_id', 'variant_id', 'pip', 'af', 'cs_id']

Total credible sets identified: 12

Credible sets for ENSG00000185837:
      phenotype_id         variant_id  pip     af cs_id
0  ENSG00000185837  22_17031458_ATT_A  1.0  0.995     1


## Advanced Usage: Dry Run and Command Generation

Both JaxQTL and TensorQTL support generating commands without execution, which is useful for debugging or running on compute clusters. This is controlled by the `run` argument.

### JaxQTL Command Generation

In [14]:
# Generate JaxQTL command without running
jaxqtl_command = run_jaxqtl(
    dd,
    prefix="jaxqtl_cluster",
    mode="cis",
    additional_covariates=["gPCs"],
    run=False,  # Don't execute, just return command
)

print("JaxQTL command:")
print(jaxqtl_command)

[2025-12-29 03:05:56,865] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:05:56,866] INFO:cellink._core.donordata: Observation found for 100 donors.


Writing BED: 100%|██████████| 1/1 [00:00<00:00, 221.49it/s]

Writing FAM... done.
Writing BIM... done.
JaxQTL command:
jaxqtl --geno jaxqtl_cluster --covar jaxqtl_cluster_donor_features.tsv --pheno jaxqtl_cluster_phenotype.bed.gz --model NB --mode cis --test-method score --window 500000 --nperm 1000 --addpc 2 --out jaxqtl_cluster --standardize





### TensorQTL Command Generation

In [15]:
# Generate TensorQTL command without running
tensorqtl_command = run_tensorqtl(
    dd,
    prefix="tensorqtl_cluster",
    mode="cis",
    additional_covariates=["gPCs"],
    run=False,  # Don't execute, just return command
)

print("\nTensorQTL command:")
print(tensorqtl_command)

[2025-12-29 03:08:06,239] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:08:06,240] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:08:06,429] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.


  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 106.27it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_cluster.log.
Options in effect:
  --bfile tensorqtl_cluster
  --make-pgen
  --out tensorqtl_cluster

Start time: Mon Dec 29 03:08:06 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cluster.fam.
4277 variants loaded from tensorqtl_cluster.bim.
Note: No phenotype data present.
Writing tensorqtl_cluster.psam ... done.
Writing tensorqtl_cluster.pvar ... 101011111212131314141515161617171818191920202121222223232424252526262727282829293030313132323333343435353636373738383939404041414242434344444545464647474848494950505151525253535454555556565757585859596060616162626363646465656666676768686969707071717272737374747575767677777878797980808181828283838484858586868787888889899090919192929

#### Saving Commands to Files
You can also save the generated commands to files for batch submission to compute clusters:

In [16]:
# Save JaxQTL command to file
jaxqtl_command = run_jaxqtl(
    dd,
    prefix="jaxqtl_cluster",
    mode="cis_acat",
    model="NB",
    window=cis_window,
    additional_covariates=["gPCs"],
    run=False,
    save_cmd_file="jaxqtl_job.sh",
)

[2025-12-29 03:06:08,165] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:06:08,166] INFO:cellink._core.donordata: Observation found for 100 donors.


Writing BED: 100%|██████████| 1/1 [00:00<00:00, 189.11it/s]

Writing FAM... done.
Writing BIM... done.





In [16]:
# Save TensorQTL command to file
tensorqtl_command = run_tensorqtl(
    dd,
    prefix="tensorqtl_cluster",
    mode="cis_susie",
    cis_output="previous_cis_results.txt.gz",
    additional_covariates=["gPCs"],
    run=False,
    save_cmd_file="tensorqtl_susie_job.sh",
)

[2025-12-29 03:08:11,429] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:08:11,431] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:08:11,735] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.


  phenotype_write_df = phenotype_write_df.groupby("#chr", sort=False, group_keys=False).apply(
Writing BED: 100%|██████████| 1/1 [00:00<00:00, 80.85it/s]

Writing FAM... done.
Writing BIM... done.





PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025)            cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to tensorqtl_cluster.log.
Options in effect:
  --bfile tensorqtl_cluster
  --make-pgen
  --out tensorqtl_cluster

Start time: Mon Dec 29 03:08:11 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cluster.fam.
4277 variants loaded from tensorqtl_cluster.bim.
Note: No phenotype data present.
Writing tensorqtl_cluster.psam ... done.
Writing tensorqtl_cluster.pvar ... 101011111212131314141515161617171818191920202121222223232424252526262727282829293030313132323333343435353636373738383939404041414242434344444545464647474848494950505151525253535454555556565757585859596060616162626363646465656666676768686969707071717272737374747575767677777878797980808181828283838484858586868787888889899090919192929