git clone https://github.com/tanyaphung/neoantigens_prioritization.git
- Python version 3.6.11
- Pandas version 1.1.3
- Each sample has its own directory, for example:
patient_3466
(under the directoryexample_data
)- Sub-directories inside the sample directory are the directories of the HLAs, for example:
HLA-A01-01
. The directory name of the HLA must match with the hla file for use as argument to the script.- Sub-directories inside the HLA directory are called
9_mers
or10_mers
or11_mers
- Inside sub-directory
9_mers
(or10_mers
), there have to be 2 files:netmhc.xsl
: output from netMHCpan4.0. Modify line 90 if your file is not in this format.netmhcstab.xsl
:output from netMHCpanstab. Modify line 91 if your file is not in this format.
- Sub-directories inside the HLA directory are called
- Sub-directories inside the sample directory are the directories of the HLAs, for example:
python neoepitope_prediction.py --hla_types_fn example_data/patient_3466/hla.txt --quant_fn example_data/patient_3466/3466_quant.sf --sample_id patient_3466 --mers 9 --data_dir example_data/
- This script outputs a file called
filtered_neoepitopes.tsv
inside the sample directory
If there's no TPM file available:
- If you don't have RNAseq data and therefore do not have the TPM file, you don't have to provide any input to
quant_fn
. Instead, this script will filter based on MHC binding affinity and MHC stability only.
python neoepitope_prediction.py --hla_types_fn example_data/patient_3466/hla.txt --sample_id patient_3466 --mers 9 --data_dir example_data/
- This script outputs a file called
filtered_neoepitopes_no_tpm.tsv
inside the sample directory
--hla_types
: this is a file listing each HLA type per line. The name of the HLA type must match with the subdirectory inside the sample directory--sample_id
: sample id must match the sample directory--mers
: Enter a number (eg: 9) or a list of numbers 9,10,11--data_dir
: Enter the parent directory that hosts all the data. For example, because the directorypatient_3466
is under the directoryexample_data
, give the path to theexample_data
directory.
--quant_fn
: give the path to the output from salmonbinding_threshold
: Default is 34 nM (Wells et al. 2020). Input a different number if you want to change the default thresholdstability_threshold
: Default is 1.4h (Wells et al. 2020). Input a different number if you want to change the default thresholdtpm_threshold
: Default is 33TPM (Wells et al. 2020). Input a different number if you want to change the default threshold