# Part 1: Run PRS-CSx

## Setting Up PRS-CSx

- **0. Create a folder to be your working directory. Let's name it `user_test` for the purpose of subsequent explanations.**
    <br>

- **1. Clone the PRS-CSx repository using the following git command**:
    ```bash
    git clone https://github.com/getian107/PRScsx.git
    ```

    Alternatively, download the source files from the [GitHub website](https://github.com/getian107/PRScsx) to `user_test`.
    <br>

- **2. In `user_test`, create a sub-folder named `ref`. Download the LD reference panels to `ref` and extract files**:
    <br>
    LD reference panels constructed using the 1000 Genomes Project phase 3 samples:

    - [EAS reference](https://www.dropbox.com/s/7ek4lwwf2b7f749/ldblk_1kg_eas.tar.gz?dl=0) (~4.33G): 
        ```bash
        wget -O ldblk_1kg_eas.tar.gz "https://www.dropbox.com/s/7ek4lwwf2b7f749/ldblk_1kg_eas.tar.gz?dl=1"
        tar -zxvf ldblk_1kg_eas.tar.gz
        ```

    - [EUR reference](https://www.dropbox.com/s/mt6var0z96vb6fv/ldblk_1kg_eur.tar.gz?e=1&dl=0) (~4.56G) : 
        ```bash
        wget -O ldblk_1kg_eur.tar.gz "https://www.dropbox.com/s/mt6var0z96vb6fv/ldblk_1kg_eur.tar.gz?e=1&dl=0"
        tar -zxvf ldblk_1kg_eur.tar.gz
        ```

    Note that these files are identical to the reference panels used in PRS-CS. Therefore, there is no need to download again if you are already using PRS-CS.

    For regions that don't have access to Dropbox, reference panels can be downloaded from the [alternative download site](https://personal.broadinstitute.org/hhuang/public/PRS-CSx/Reference/).
    <br>
    <br>

- **3. Download the SNP information file and put it in the same folder containing the reference panels**:
    - [1000 Genomes reference: SNP info](https://www.dropbox.com/s/rhi806sstvppzzz/snpinfo_mult_1kg_hm3?dl=0) (~106M): 
        ```bash
        wget -O snpinfo_mult_1kg_hm3 "https://www.dropbox.com/s/rhi806sstvppzzz/snpinfo_mult_1kg_hm3?dl=0"
        ```
    <br>

- **4. PRScsx requires Python packages `scipy` and `h5py` installed**:
    - [scipy](https://www.scipy.org/)
    - [h5py](https://www.h5py.org/)
    <br>
    <br>

- **5. Once Python and its dependencies have been installed, running the following will print a list of command-line options.**:
    ```bash
    ./PRScsx.py --help 
    # or 
    ./PRScsx.py -h
    ```

## Using PRS-CSx with Test Data
The test data contains EUR and EAS GWAS summary statistics and a bim file for 1,000 SNPs on chromosome 22. An example to use the test data:
```bash
python PRScsx.py --ref_dir=path_to_ref --bim_prefix=path_to_bim/test --sst_file=path_to_sumstats/EUR_sumstats.txt,path_to_sumstats/EAS_sumstats.txt --n_gwas=200000,100000 --pop=EUR,EAS --chrom=22 --phi=1e-2 --out_dir=path_to_output --out_name=test
```
The test data analysis would be finished in approximately 1 min when using 8Gb of RAM.

Example use given that the reference panels are downloaded in a folder named `ref` and that the current working directory is where PRScsx.py is located:
1. Create a directory to store output:
    ```bash
    mkdir -p ../output
    ```
    
2. Run PRS-CSx: <br>
    ```bash
    python PRScsx.py --ref_dir=../ref --bim_prefix=./test_data/test --sst_file=./test_data/EUR_sumstats.txt,./test_data/EAS_sumstats.txt --n_gwas=200000,100000 --pop=EUR,EAS --chrom=22 --phi=1e-2 --out_dir=../output --out_name=test
    ```
    The output will be stored in the `output` folder created earlier.