## Environment Setup
Please ensure the **MetaXcan repo** and **"imlabtools" env** are installed through the **terminal** before starting.<br>
Or open your terminal and install using the following commands:<br>

<h3>Clone the MetaXcan repo</h3>
<pre><code>git clone https://github.com/hakyimlab/MetaXcan
</code></pre>

<h3>Change directory</h3>
<pre><code>cd MetaXcan/software
</code></pre>

<h3>Install imlabtools Conda Environment and Load</h3>
<pre><code>conda activate base
conda env create -f /path/to/this/repo/software/conda_env.yaml
conda activate imlabtools
</code></pre>

In [None]:
!pip install requests

In [13]:
import os
import requests
import gzip
import shutil
import tarfile
from concurrent.futures import ThreadPoolExecutor

# Initialize a session for HTTP requests
session = requests.Session()

# Download and preprocess GWAS data
def download_and_process(url, download_dir):
    filename = os.path.basename(url)
    filepath = os.path.join(download_dir, filename)
    temp_file = filepath.rstrip('.gz')
    
    # Stream download file if it does not exist
    if not os.path.exists(filepath):
        print(f"Downloading {filename}...")
        with session.get(url, stream=True) as r:
            r.raise_for_status()
            with open(filepath, 'wb') as f:
                for chunk in r.iter_content(chunk_size=8192):
                    f.write(chunk)
        print(f"Downloaded {filename} successfully.")
    
    # Process the file
    print(f"Processing {filename}...")
    with gzip.open(filepath, 'rt') as f_in, open(temp_file, 'w') as f_out:
        first_line = True
        for line in f_in:
            if first_line:
                f_out.write(line)
                first_line = False
            else:
                parts = line.split('\t')
                snp_info = parts[4].split(':')
                parts[4] = f"chr{snp_info[0]}_{snp_info[1]}_{snp_info[2]}_{snp_info[3]}_b38"
                f_out.write('\t'.join(parts))
                
    # Compress the processed file back
    with open(temp_file, 'rb') as f_in, gzip.open(filepath, 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)
    os.remove(temp_file)
    print(f"Completed processing for {filename}.")

# Download tissue models
def download_models(url, download_dir):
    filename = os.path.basename(url)
    filepath = os.path.join(download_dir, filename)
    
    # Stream download file if it does not exist
    if not os.path.exists(filepath):
        print(f"Downloading {filename}...")
        with session.get(url, stream=True) as r:
            r.raise_for_status()
            with open(filepath, 'wb') as f:
                for chunk in r.iter_content(chunk_size=8192):
                    f.write(chunk)
        print(f"Downloaded {filename} successfully.")
    
    # Extract if it's a tar file
    if filepath.endswith('.tar'):
        print(f"Extracting {filename}...")
        with tarfile.open(filepath) as tar:
            tar.extractall(path=download_dir)
        print(f"Completed extracting {filename}.")

# Define directories and URLs
base_dir = 'MetaXcan/software'
data_dir = os.path.join(base_dir, "data_covid")
model_dir = os.path.join(base_dir, "predi_models")
os.makedirs(data_dir, exist_ok=True)
os.makedirs(model_dir, exist_ok=True)

data_urls = [
"https://storage.googleapis.com/covid19-hg-public/20200915/results/20201020/COVID19_HGI_A1_ALL_20201020.txt.gz",
"https://storage.googleapis.com/covid19-hg-public/20200915/results/20201020/COVID19_HGI_A2_ALL_leave_23andme_20201020.txt.gz",
"https://storage.googleapis.com/covid19-hg-public/20200915/results/20201020/COVID19_HGI_B1_ALL_20201020.txt.gz",
"https://storage.googleapis.com/covid19-hg-public/20200915/results/20201020/COVID19_HGI_B2_ALL_leave_23andme_20201020.txt.gz",
"https://storage.googleapis.com/covid19-hg-public/20200915/results/20201020/COVID19_HGI_C1_ALL_leave_23andme_20201020.txt.gz",
"https://storage.googleapis.com/covid19-hg-public/20200915/results/20201020/COVID19_HGI_C2_ALL_leave_23andme_20201020.txt.gz",
"https://storage.googleapis.com/covid19-hg-public/20200915/results/20201020/COVID19_HGI_D1_ALL_20201020.txt.gz"
]

model_urls = [
"https://zenodo.org/record/3518299/files/mashr_eqtl.tar",
"https://zenodo.org/record/3518299/files/gtex_v8_expression_mashr_snp_smultixcan_covariance.txt.gz"
]

# Parallel processing with four worker nodes in max(Please check your PC/MAC infrastructure)
with ThreadPoolExecutor(max_workers=4) as executor:
    executor.map(lambda x: download_and_process(*x), [(url, data_dir) for url in data_urls])

with ThreadPoolExecutor(max_workers=4) as executor:
    executor.map(lambda x: download_models(*x), [(url, model_dir) for url in model_urls])

Downloading COVID19_HGI_A1_ALL_20201020.txt.gz...Downloading COVID19_HGI_A2_ALL_leave_23andme_20201020.txt.gz...

Downloading COVID19_HGI_B1_ALL_20201020.txt.gz...
Downloading COVID19_HGI_B2_ALL_leave_23andme_20201020.txt.gz...
Downloaded COVID19_HGI_A1_ALL_20201020.txt.gz successfully.
Processing COVID19_HGI_A1_ALL_20201020.txt.gz...
Downloaded COVID19_HGI_A2_ALL_leave_23andme_20201020.txt.gz successfully.
Processing COVID19_HGI_A2_ALL_leave_23andme_20201020.txt.gz...
Downloaded COVID19_HGI_B1_ALL_20201020.txt.gz successfully.
Processing COVID19_HGI_B1_ALL_20201020.txt.gz...
Downloaded COVID19_HGI_B2_ALL_leave_23andme_20201020.txt.gz successfully.
Processing COVID19_HGI_B2_ALL_leave_23andme_20201020.txt.gz...
Completed processing for COVID19_HGI_A1_ALL_20201020.txt.gz.
Downloading COVID19_HGI_C1_ALL_leave_23andme_20201020.txt.gz...
Downloaded COVID19_HGI_C1_ALL_leave_23andme_20201020.txt.gz successfully.
Processing COVID19_HGI_C1_ALL_leave_23andme_20201020.txt.gz...
Completed processi

## Run MetaXcan pipeline
Please ensure the **ipykernel** and **notebook** existed, and install the **imlabtools** environment to the ipykernel via the following commands in **terminal**:<br>

<h3>Install conda env to juypter notebook kernal</h3>
<pre><code>
!pip install notebook
!pip install ipykernel
!python -m ipykernel install --user --name=imlabtools</code></pre>

<h3>Then, open your juypter notebook again and ensure your kernal change to imlabtools. </h3>


In [1]:
# Calling runSPrediXcan.sh
!bash runSPrediXcan.sh

Running SPrediXcan for Lung with COVID19_HGI_A1_ALL_20201020.txt.gz...
INFO - MetaXcan/software/output/spredixcan/COVID19_HGI_A1_ALL_20201020__PM__Lung.csv already exists, move it or delete it if you want it done again
SPrediXcan processing completed for COVID19_HGI_A1_ALL_20201020.txt.gz and tissue Lung.
Running SPrediXcan for Whole_Blood with COVID19_HGI_A1_ALL_20201020.txt.gz...
INFO - MetaXcan/software/output/spredixcan/COVID19_HGI_A1_ALL_20201020__PM__Whole_Blood.csv already exists, move it or delete it if you want it done again
SPrediXcan processing completed for COVID19_HGI_A1_ALL_20201020.txt.gz and tissue Whole_Blood.
Running SPrediXcan for Lung with COVID19_HGI_A2_ALL_leave_23andme_20201020.txt.gz...
INFO - MetaXcan/software/output/spredixcan/COVID19_HGI_A2_ALL_leave_23andme_20201020__PM__Lung.csv already exists, move it or delete it if you want it done again
SPrediXcan processing completed for COVID19_HGI_A2_ALL_leave_23andme_20201020.txt.gz and tissue Lung.
Running SPrediXc

In [None]:
# Calling runSMultiXcan.sh
!bash runSMultiXcan.sh

Running SMultiXcan for COVID19_HGI_A1_ALL_20201020...
INFO - MetaXcan/software/output/smultixcan/COVID19_HGI_A1_ALL_20201020_smultixcan.csv already exists, you have to move it or delete it if you want it done again
SMultiXcan processing completed for COVID19_HGI_A1_ALL_20201020.
Running SMultiXcan for COVID19_HGI_A2_ALL_leave_23andme_20201020...
INFO - MetaXcan/software/output/smultixcan/COVID19_HGI_A2_ALL_leave_23andme_20201020_smultixcan.csv already exists, you have to move it or delete it if you want it done again
SMultiXcan processing completed for COVID19_HGI_A2_ALL_leave_23andme_20201020.
Running SMultiXcan for COVID19_HGI_B1_ALL_20201020...
INFO - MetaXcan/software/output/smultixcan/COVID19_HGI_B1_ALL_20201020_smultixcan.csv already exists, you have to move it or delete it if you want it done again
SMultiXcan processing completed for COVID19_HGI_B1_ALL_20201020.
Running SMultiXcan for COVID19_HGI_B2_ALL_leave_23andme_20201020...
INFO - MetaXcan/software/output/smultixcan/COVID19