v1.3.0
Public Health Bioinformatics v1.3.0 Release Notes
This minor release introduces two new workflows, improves on several workflows, and resolves various bugs
Full release notes can be found here.
馃啎 New workflows:
-
TheiaCoV_FASTA_Batch_PHB
- This workflow implements TheiaCoV_FASTA for many SARS-CoV-2 samples at once.
- This a set-level workflow that populates the results to a sample-level data table in Terra.bio
- Currently, this workflow only runs Pangolin4 and NextClade
- Import the workflow from Dockstore
-
Rename_FASTQ_PHB
- This workflow is a utility to quickly and easily rename a set of FASTQ files, either paired-end or single-end.
- Import the workflow from Dockstore
馃殌 Changes to existing workflows:
-
TheiaCoV_ONT_PHB
- Influenza is now supported. Use
"flu"
for theorganism
optional input String parameter."sars-cov-2"
and"HIV"
tracks are unchanged.
- Influenza is now supported. Use
-
TheiaProk Workflow Series
- If user-input (
expected_taxon
) or predicted taxon by Gambit belongs to theShigella
genus, the Extensively Drug-Resistant phenotype is predicted using the new resfinder pointfinder database. - If user-input (
expected_taxon
) or predicted taxon by Gambit is the Mycobacterium tuberculosis species, bcftools indexes and merges all potential VCF files created by TbProfiler (both .bcf and .gz files). - Kraken2 has been added as an optional module (except for TheiaProk_ONT_PHB). If
call_kraken
istrue
, a database must be provided throughkraken_db
. - Two new optional inputs were added to control ANIm behaviour:
ani_threshold
(default85.00
) andpercent_bases_aligned_threshold
(default70.00
).
- If user-input (
-
TheiaCoV_FASTA_PHB
- The list of allowed input
organism
now includes"sars-cov-2"
(default),"rsv_a"
,"rsv_b"
,"WNV"
,"MPXV"
and"flu"
.
- The list of allowed input
-
TheiaCoV_Illumina_PE_PHB
- If organism is set as
"flu"
, the workflow searches for antiviral mutations in the HA, NA, PA, PB1 and PB2 assembly segments, targeting the following 10 antivirals.: A_315675, compound_367, Favipiravir, Fludase, L_742_001, Laninamivir, Peramivir, Pimodivir, Xofluza and Zanamivir.
- If organism is set as
-
All Illumina SE and PE Workflows
- A new optional input,
read_qc
, to allow the user to decide betweenfastq_scan
andfastqc
for the evaluation of read quality. The affected workflows are: TheiaCoV_Illumina_PE_PHB, TheiaCoV_Illumina_SE_PHB, TheiaProk_Illumina_SE_PHB, TheiaProk_Illumina_PE_PHB, TheiaMeta_Illumina_PE_PHB and Freyja_FASTQ_PHB.
- A new optional input,
-
CZGenEpi_Prep_PHB
- Instead of extracting the
sample_is_private_column_name
and thegisaid_id_column_name
columns, these columns are now generated by the program using already-provided inputs and by the newis_private
Boolean variable which is used to set the value for all samples in the set. The field "GISAID ID (Public ID) - Optional" will now reflect the GISAID syntax for Virus Name.
- Instead of extracting the
Docker container updates:
- AMRFinderPlus has been updated to version v3.11.20 and database 2023-09-26.1
- tbp-parser has been updated to version 1.2.0
- Freyja has been updated to version 1.4.8
- ts_mlst database has been updated as of January 2024
- Gambit has been updated to version 1.3.0, including its database files
- Pangolin4 has been updated to version 4.3.1-pdata-1.23.1
- IRMA has been updated to version 1.1.3
Tag updates:
- SARS-CoV-2 Nexclade Dataset Tag has been updated to
2023-12-03T12:00:00Z
馃悰 Bug fixes and small improvements:
- kSNP3_PHB: The
ksnp3_core_vcf
output has been renamed toksnp3_vcf_ref_genome
for readability. Additionally, two new outputs are provided:ksnp3_vcf_snps_not_in_ref
andksnp3_vcf_ref_samplename
. - TheiaProk Workflow Series: The MIDAS task was adjusted to reduce logging, and therefore the size of the log file, aiding debugging & reducing storage costs.
- TheiaMeta_Illumina_PE_PHB: A new task Krona was added for the visualization of the Kraken2 reports.
- Mercury_Prep_N_Batch: The
excluded_samples.tsv
is now printed to the execution log file, aiding debugging. - TheiaCoV Workflow Series: The
nextclade_lineage
output now populates correctly for SARS-CoV-2. Additionally, thenexclade_qc
field is now exposed as an output. - Augur_PHB: The AUGUR refine input
clock_filter_iqd
has been reverted to the previous default value of 4. - Kraken Standalone Workflows: A new task Krona was added for the visualization of the Kraken2 reports.
- TheiaValidate_PHB: TheiaValidate now outputs a table with validation-criteria failures only. Additionally, a new input was added that can translate different column names between tables to enable comparison.
- TheiaCoV_ONT_PBH: If a sample fails quality check with read screening, this will no longer cause the workflow to fail. Instead, it will finish with an appropriate message.
- Samples_To_Ref_Tree_PHB: The
organism
input has been renamed tonextclade_dataset_name
for better clarity. - Various workflows: Call caching was disabled in the following workflows: BaseSpace_Fetch_PHB, Transfer_Column_Content_PHB, Assembly_Fetch_PHB, Snippy_Streamline_PHB and TheiaValidate_PHB.
What's Changed
- updated VCF output file renaming in kSNP3 task by @kapsakcj in #207
- reduce unnecessary logging in MIDAS task by @kapsakcj in #210
- update default amrfinderplus docker image to v3.11.20 and db 2023-09-26.1 by @kapsakcj in #229
- TheiaCoV_ONT_PHB Influenza Track by @jrotieno in #233
- TheiaCoV_FASTA_Batch: TheiaCoV_FASTA, for many samples at once by @sage-wright in #238
- Add krona task to TheiaMeta_Illumina_PE by @cimendes in #213
- added 2 QC thresholds to ANI task to reduce false positives by @kapsakcj in #168
- Resfinder improvements, added support for Shigella spp., added XDR Shigella prediction by @kapsakcj in #159
- disable call caching for various workflows by @kapsakcj in #251
- Mercury_Prep_N_Batch: print the excluded_samples.tsv and update Docker to avoid Google SDK warning by @sage-wright in #220
- Nextclade Output Added by @DOH-HNH0303 in #239
- TheiaCoV_FASTA: Adding five new organisms by @jrotieno in #194
- Update task_augur_refine iqd back to 4 by @jrotieno in #268
- TheiaCoV Illumina PE: Identify Influenza Antiviral Resistance Mutations in Assemblies by @jrotieno in #252
- [New Utility] Workflow to rename FASTQ files (non-destructive) by @cimendes in #267
- [TheiaCoV_Fasta_Batch] Substitute FASTA concatenating task to ensure proper sample_id propagation by @cimendes in #274
- Kraken2 Standalone: add krona visualisation by @cimendes in #225
- TheiaValidate_PHB: new features and new Docker image from TheiaValidate repository by @sage-wright in #255
- TheiaProk TB: new VCF output and modification to the coverage report by @sage-wright in #245
- TheiaCoV_ONT: prevent failure by coercing files into strings by @sage-wright in #288
- update default freyja docker image to 1.4.8 for multiple tasks by @kapsakcj in #289
- FastQC added as an optional module in all Illumina_PE and Illumina_SE workflows by @sage-wright in #260
- update docker to version tag 2.23.0-2024-01 by @cimendes in #293
- [TheiaProk Workflows] Add Kraken2 as optional module by @cimendes in #286
- CZGenEpi_Prep_PHB: implementing user-requested changes by @sage-wright in #244
- Update Gambit database files to version 1.3.0 by @kevinlibuit in #292
- [PHB Release 1.3.0] update version and docker tags (nexclade sc2, pangolin, tbp-parser 1.1.7) by @cimendes in #296
- [PR Template Update] Updating template per identified dev process improvements by @kelseykropp in #300
- [TheiaProk suite] Patch fix: change type of kraken2_report to be string in taxon_table task by @cimendes in #297
- Samples_To_Ref_Tree_PHB: changed "organism" input to "nextclade_dataset_name" by @jrotieno in #303
- theiacov_fasta wf logic change for flu by @kapsakcj in #305
- restore vadr_num_alerts string output to theiacov_fasta workflow by @kapsakcj in #307
New Contributors
- @DOH-HNH0303 made their first contribution in #239
- @kelseykropp made their first contribution in #300
Full Changelog: v1.2.1...v1.3.0