# To start a PyCBC run:

Everything you need to start your runs is in this directory **which you need to copy** to where you'd like to run from:

In [None]:
cp -r /home/hannah.griggs/nu/pynu_tests/o2grbs/pynuruns

Once you copy that directory, you will need to adjust some of the files to point to your directories/namespace. To see all of the places in here that point to MY directory, run 

In [None]:
grep -i "hannah.griggs" *

This should print all of the files that contain "hannah.griggs" and the line it's in. Yay grep! 

**You only need to change the INI_LOC and STATISTIC_FILE lines in runhlo2.sh**. 

The STATISTIC_FILE should be changed to the location of the custom dtdp PDF that you made.

#### Refer to DtDpPDFTutorial for how to generate the TPA PDF for your GRB

**Launch the necessary environment (pynumods)**.

You must deactivate the (igwn) environment first, then source the modified pynu environment like this with the "source" command.

In [None]:
conda deactivate # Get out of the (igwn) environment. Make sure to have no active environment at all before sourcing.
source src/nu-dev/pynumods/bin/activate # Activate the pynu environment.

## Part 1: Setting up your GRB

For the O2 rerun experiments, we are rerunning the O2 PyCBC boxes with targeted TPA PDFs. 

**Gather your GRB information**

Visit https://docs.google.com/spreadsheets/d/1FuLUsVUoQGJPYha1vU2znyChYIt1mCyerwVhiiYxeBU/edit?usp=sharing to see the list of GRBs for O2. This contains the information you will use to generate the sky-phase-amplitude PDFs, following the TPA PDF tutorial.

Once the PDF is generated, identify which PyCBC offline chunk the GRB GPS time falls into.

Chunk times are gathered into this directory:

In [None]:
/home/hannah.griggs/nu/pynu_tests/o2grbs/chunkinis

Which I also copied into pynuruns. The files within are actually the full configurations for the offline O2 PyCBC runs, but to see the GPS time span for each, run:

In [None]:
head ch2

Replacing `ch2` with the chunk number you want to look at. The `head` command prints the first 10 lines of a file. Likewise, `tail` prints the last 10 lines.

Once you find the correct chunk, note the start and end times. You will use these for the analysis.

## Part 2: Running an analysis with reused data:

In the pynuruns directory are ini files for the pycbc workflow. 

### **runhlo2.sh** will need to be edited as:

In [None]:
WORKFLOW_NAME=mmatest ## you can keep this the same
CONFIG_TAG=v2.3.2.3  ## keep this
GITLAB_URL="https://git.ligo.org/pycbc/offline-analysis/-/raw/${CONFIG_TAG}/production/o4/broad/config"
ID=grb161212652 ## Change this to your GRB name
RUNID=_4 ## If this is a rerun, use this line to indicate which rerun. Otherwise this can be blank.

In [None]:
INI_LOC="/home/hannah.griggs/nu/pynu_tests/o2grbs/" # Change to your pynuruns location
STATISTIC_FILE="home/hannah.griggs/nu/pynu_tests/skyloc/dtdphase/L1H1-stat-GRB161212652.hdf" # Change to your custom PDC location

### To rerun an analysis using existing results, we need to use a **cache file**

In the `maps` directory in pynuruns, you will notice a file called `chunk2.map`. This is an example of a cache file that tells PyCBC the jobs it doesn't need to redo. In that file are two `HDF_TRIGGER_MERGE` files, one for each IFO.

With the GPS start and end times you identified for the PyCBC chunk corresponding to your GRB, locate the `HDF_TRIGGER_MERGE` file that matches the chunk times in this archive directory:

In [None]:
/home/ian.harry/aLIGO/O2/analyses/ALL_TRIGGER_FILES/

Copy the file to your `chunkinis` directory to make transferring more efficient (by trial and error, it seems like transfers succeed more often if the file is in your namespace):

In [None]:
cp /home/ian.harry/aLIGO/O2/analyses/ALL_TRIGGER_FILES/H1-HDF_TRIGGER_MERGE_FULL_DATA-1164556817-1929600.hdf chunkinis

Copy the `chunk2.map` cache file and name it after the chunk you are working with. 

The files have entries which tell PyCBC where to find the files it can be reused, like:

In [None]:
L1-HDF_TRIGGER_MERGE_FULL_DATA-1164556817-1929600.hdf /home/hannah.griggs/nu/pynu_tests/o2grbs/chunkinis/L1-HDF_TRIGGER_MERGE_FULL_DATA-1164556817-1929600.hdf pool=
"local"

**Replace the GPS times in the file with those matching the `HDF_TRIGGER_MERGE` file you found.**

Edit `runhlo2.sh` to point to your cache file instead of `chunk2.map`.

In [None]:
  --cache-file maps/chunk2.map \

If you happen to need chunk 2, then these steps are done for you.

**Change the start and end times to reflect the chunk start and end times in `runhlo2.sh`:**

In [None]:
  --config-overrides \
      results_page:output-path:"/home/${USER}/public_html/pynu/o3/runs/${ID}/${ID}${RUNID}" \
      workflow:start-time:"1166486417" \
      workflow:end-time:"1169107218" \ # Change these for your chunk

### Good to go! Run the analysis with:

In [None]:
./runhlo2.sh

You will be prompted to enter your password, then it'll be off. 

## Part 3: Troubleshooting if jobs are struggling:

You'll need to babysit the jobs since they've been having issues with disk space. Your run will live in a directory named after your GRB, like `output<GRBNAME_RUNID>`.
Check how the queue is doing from within the run directory with:

In [None]:
./status

**If a small cluster of jobs fail**, let the analysis get as far as it can until the status updates to (FAILURE).

### Restarting a job that failed

Once it fails, edit the "start" script (in your run output directory) to include the preamble for the run.sh script (for authentication reasons):

In [None]:
ecp-get-cert --destroy
htdestroytoken
kinit hannah.griggs ## REMEMBER TO CHANGE TO YOUR NAME
unset XDG_RUNTIME_DIR
htgettoken -a vault.ligo.org -i igwn --scopes dqsegdb.read,gwdatafind.read,read:/frames,read:/ligo,read:/virgo,read:/kagra
condor_vault_storer -v igwn
export GWDATAFIND_SERVER="datafind.ligo.org:443"
PEGASUS_PYTHON=/home/ian.harry/conda_envs/pegasus_python/bin/python PATH=/home/ian.harry/conda_envs/pegasus_python/bin/:${PATH}

pegasus-run /local/hannah.griggs/pycbc-tmp_u_1hqa3g/work $@

Then you can restart the job with:

In [None]:
./start

### Restarting a job that's held

**If jobs are getting held**, see the reason with:

In [None]:
condor_q better-analyze

This will tell you which job requirements are insufficient and by how much. If memory or disk space are the problem, update held jobs like this:

In [None]:
condor_qedit -constraint "JOBSTATUS==5" RequestDisk=newrequestamount

Change RequestDisk to RequestMemory as needed, and only request a little over what the jobs seem to need.

Release jobs again with 

In [None]:
condor_release -constraint "JOBSTATUS==5"

## Part 4: When the run is done

The run is done when a file called:

In [None]:
H1L1-PAGE_FOREGROUND_FULL_DATA-......html

appears in the `output<GRBNAME_RUNID>/results/8._open_box_result` directory. The file will be tagged with the GPS start time and duration of the chunk you used.

**Once this file populates**, copy it to the `results` directory that I put in pynuruns. Rename it to indicate the GRB it reflects, as:

In [None]:
cp ${pwd}/output${GRBNAME}/results/8._open_box_result/H1L1-PAGE_FOREGROUND_FULL_DATA-1239800000-200000.html ${RESULTS_PATH}/results/output${GRBNAME}_FG.html

Adjust the specifics of the copy command to reflect the html file you wish to copy, the location of the `results` directory to which you want to copy, and the name you want it to have. 

Now, **copy the PyCBC all-sky file from the corresponding chunk** from this directory archived by Derek Davis:

In [None]:
/home/derek.davis/public_html/cbc/O2/clean_data_runs/

For example, **if I am working with Chunk 2**, I would copy this file into my `results` directory and rename it:

In [None]:
cp /home/derek.davis/public_html/cbc/O2/clean_data_runs/o2-c02-clean-analysis-2-v1.9.1/7._open_box_result/H1L1-PAGE_FOREGROUND_HTML-1164556817-1929600.html /home/hannah.griggs/<path-to-results>/outputallskychunk2_FG.html

#### Put the relevant information into CSV format:

Edit `csv_maker.py` to reflect your GRB, the chunk you are working with, and the path to your `results` directory in this line:

In [None]:
# Directory containing HTML files
input_directory = '/home/hannah.griggs/nu/pynu_tests/skyloc/run_testing/events_test/o3events/results/'

# Chunk number and GRB name as appears in your .html file
chunk='2'
grbid='grb161212652'

Run `csv_maker.py` to output a merged CSV file with the PyNu results compared to the all-sky PyCBC results:

In [None]:
python3 csv_maker.py

## Part 5: Calculating a p-value for your box

With your new CSV results file, you can now calculate a modified z-score and p-value for triggers within a range of target times.

If you want to keep things organized, I would suggest making a new directory within `results` to store all the files we'll make here to run these scripts from. I called mine `pvals`, but you'll get a chance to change it.

There are two scripts to run here, `backgroundpval.py` and `foregroundpval.py`. `background` calculates the incidence of a range of modified Z-scores in the full box. With this frequency of Z-values established, we can compare the significance of our trigger Z-scores to the background, which is what `foreground` does.

**Before running**, adjust the initializations in both of these files to reflect the grb you ran and to point to your directories.

In `backgroundpval.py`:

In [None]:
# Chunk number and GRB name as appears in your .html file
chunk='2'
grbid='grb161212652'
# Directory containing CSV files
input_directory = '/home/hannah.griggs/nu/pynu_tests/o2grbs/results'
results_directory = 'pvals'

Here, you can change to your chunk, your GRB name, and call your results directory whatever you want.

In `foregroundpval.py`:

In [None]:
# GRB information
grbid='grb161212652'
center_end_time = 1165408451 # GRB T0
trigger_timewindow = 10 # Generous physically-motivated time delay betwen GW and GRB

# Load your data
# Directory containing CSV files
input_directory = '/home/hannah.griggs/nu/pynu_tests/o2grbs/results/pvals'

Here, you need to input the GRB's central time T0 as well as your GRB name. Don't forget as well to change the result directory path as it applies.

Run these files with:

In [None]:
python3 backgroundpval.py
python3 foregroundpval.py

### This will print four lines which you need to save in any way you see fit. **Please report the p-value printed, the trigger time identified as most significant for the GRB, and it's modified Z-score in the spreadsheet by your GRB.**

For example, the run for GRB 161212652 prints the following:

In [None]:
Top Z-score for signal end time 1165408451: 0.6744712302670444 at 1165407882.6
Number of non-signal end times with Z-score >= 0.6744712302670444: 13243
Probability of another end time having a Z-score >= 0.6744712302670444: 0.3645

So, I would say that the trigger at time **1165407882.6** recieved a **Z-score of 0.6744712302670444** and **p-value of 0.3645**. 

Note that this trigger time is very far away from the GRB T0. In reality, within a generous +/-10 second allowance, there was no trigger above ranking statistic of 5, whichis the lowest that PyCBC saved back in O2. So, for illustration, I expanded the trigger_timewindow to +/-1000 seconds. 

If there's no matching time, it will look more like:

In [None]:
Top Z-score for signal end time 1165408451: nan at nan
Number of non-signal end times with Z-score >= nan: 0
Probability of another end time having a Z-score >= nan: 0.0000

In O3, triggers were saved down to far lower ranking statistics, so this shouldn't be a problem for the O3 GRB analysis.

## That's all! Please reach out with any questions or if things are not working.