Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use shell=False for subprocess calls #58

Merged
merged 5 commits into from
Jan 12, 2024
Merged

Conversation

nwiltsie
Copy link
Member

Description

This PR switches both instances of subprocess.Popen (launching Nextflow and launching the custom assert script) to use shell=False.

It's generally recommended to use shell=False to avoid any potential shell injections.

One subtle change associated with this is that we have to set the NXF_WORK environment variable via the env argument of subprocess.Popen, rather than embedding it in the command string. env is the complete set of environment variables so we have to merge the current process's environment with our changes ({**os.environ, **envmod}).

I tested this by adding a custom comparison script to pipeline-recalibrate-BAM's NFTest suite:

$ git diff nftest.yml
diff --git a/nftest.yml b/nftest.yml
index afb3919..f577708 100644
--- a/nftest.yml
+++ b/nftest.yml
@@ -14,13 +14,13 @@ cases:
     asserts:
       - actual: recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/GATK-4.2.4.1/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam
         expect: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam
-        method: md5
+        script: ./compare.py
       - actual: recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/GATK-4.2.4.1/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam.bai
         expect: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam.bai
-        method: md5
+        script: ./compare.py
       - actual: recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/GATK-4.2.4.1/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam.sha512
         expect: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam.sha512
-        method: md5
+        script: ./compare.py
       - actual: recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/GATK-4.2.4.1/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam.bai.sha512
         expect: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam.bai.sha512
-        method: md5
+        script: ./compare.py

The NFTest log file (/hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/main/log-nftest-20240111T231156Z.log) shows that both the Nextflow invocation...

2024-01-11 23:11:56,768 - NFTest - INFO - NXF_WORK=./test/work nextflow run ./main.nf -c /hot/code/nwiltsie/pipelines/pipeline-recalibrate-BAM/test/nftest.config -params-file ./test/single.yaml --output_dir /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/main/A-mini-n2

... and the comparison invocations...

2024-01-11 23:25:25,339 - NFTest - DEBUG - ./compare.py /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/main/A-mini-n2/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/GATK-4.2.4.1/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam
2024-01-11 23:25:26,298 - NFTest - DEBUG - /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/main/A-mini-n2/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/GATK-4.2.4.1/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam
2024-01-11 23:25:26,322 - NFTest - DEBUG - /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/output/BWA-MEM2-2.2.1_GATK-4.2.4.1_A-mini_S2-v1.1.5.bam
2024-01-11 23:25:26,322 - NFTest - DEBUG - Assertion passed

... are well-formatted and function appropriately (compare.py just prints the inputs and exits with 0).

Checklist

  • This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
    Disclosing PHI is a major problem1 - Even a small leak can be costly2.

  • This PR does NOT contain germline genetic data3, RNA-Seq, DNA methylation, microbiome or other molecular data4.

  • This PR does NOT contain other non-plain text files, such as: compressed files, images (e.g. .png, .jpeg), .pdf, .RData, .xlsx, .doc, .ppt, or other output files.

  To automatically exclude such files using a .gitignore file, see here for example.

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have set up or verified the main branch protection rule following the github standards before opening this pull request.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have added the major changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

Footnotes

  1. UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records

  2. The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records.

  3. Genetic information is considered PHI.
    Forensic assays can identify patients with as few as 21 SNPs

  4. RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity.

@nwiltsie
Copy link
Member Author

Odd, I thought I disabled the cross-file similarity checker... https://github.com/uclahs-cds/docker-CICD-base/pull/68

@nwiltsie
Copy link
Member Author

Okay, I had to pull the common selector-based code into a common method to make the linter happy (apparently that comment broken on different lines made all the difference to it), but things still work as expected.

/hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/main/log-nftest-20240112T005925Z.log

@yashpatel6 yashpatel6 self-assigned this Jan 12, 2024
@yashpatel6
Copy link
Contributor

Changes look good! I think the tests will need to be updated to reflect the selector-based code being moved around

@nwiltsie
Copy link
Member Author

Ah yeah I didn't run the tests for myself (there should be an action for that...) but I'll do so this morning.

@nwiltsie
Copy link
Member Author

Okay, the tests now run correctly:

$ git rev-parse HEAD
aca1840371ee339fea01af8703768265dd5bf76b
$ pytest
============================= test session starts ==============================
platform linux -- Python 3.10.7, pytest-7.4.4, pluggy-1.3.0
rootdir: /hot/code/nwiltsie/tools/tool-NFTest
collected 17 items

test/unit/test_NFTestAssert.py .........                                 [ 52%]
test/unit/test_NFTestCase.py ..                                          [ 64%]
test/unit/test_NFTestEnv.py ..                                           [ 76%]
test/unit/test_NFTestRunner.py .                                         [ 82%]
test/unit/test_common.py ...                                             [100%]

============================== 17 passed in 0.70s ==============================

Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@nwiltsie nwiltsie merged commit dcf0a86 into main Jan 12, 2024
1 check passed
@nwiltsie nwiltsie deleted the nwiltsie-subprocess-shell branch January 12, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants