Permalink
Switch branches/tags
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
260 lines (172 sloc) 8.45 KB

Import results

Import functions allow the users to import a JSON file (extension .json), or multiple files, with their own plasmid data. These files are generated by pATLASflow, a pipeline to run mapping, mash screen and assembly methods for pATLAS. They can also be generated through FlowCraft recipes

The json files can be imported using a the Upload file... button or by dragging and droping the files to the text box on the right of this button.

To do so, you can use two different programs:

  • pATLASflow - In this approach it is assumed that the user has already performed qc analysis, assemblies and every required analysis before, mash dist, mash screen and mapping approaches here provided.
  • FlowCraft - Here you can use raw reads and feed an assembly, mapping or mash screen approach. The pipeline will handle qc analysis and trimming with default parameters described in FlowCraft documentation and then perform the desired analysis (either mash dist / assembly, mash screen or mapping).

Note: Check also the redundancy removal rules described at the end of this file.

pATLASflow

Download and install requirements to run the pipeline

pATLASflow is a NextFlow pipeline.

Requirements

Conda recipe for nextflow

Nextflow can be installed through bioconda: install with bioconda

conda install nextflow

Mapping

The mapping pipeline can be run with the following command:

nextflow run tiagofilipe12/pATLASflow --mapping --reads "your_folder/*.fastq"

The resulting JSON file can then be provided to pATLAS in the Mapping menu.

Mash screen

The mash screen pipeline can be run with the following command:

nextflow run tiagofilipe12/pATLASflow --mash_screen --reads "your_folder/*.fastq"

The resulting JSON file can then be provided to pATLAS in the Mash screen menu.

Assembly

The sequence pipeline can be run with the following command:

nextflow run tiagofilipe12/pATLASflow --assembly --fasta "your_folder/*.fasta"

The resulting JSON file can then be provided to pATLAS in the Assembly menu.

Consensus

A consensus approach between the Mash screen and Mapping results. To generate this JSON input users must run the following command:

nextflow run tiagofilipe12/pATLASflow --mapping --mash_screen --reads "your_folder/*.fastq"

Then, the following JSON file can then be provided to pATLAS in the Consensus menu.


FlowCraft

Download and install requirements

In order to download and install FlowCraft please follow the official instructions.

Use FlowCraft recipes

In order to use pATLAS recipes using FlowCraft there a 4 recipes that you can use:

  • Mapping

First build the pipeline script with this command:

flowcraft.py build -r plasmids_mapping -o pipeline

And then execute the pipeline by running nextflow in the script:

nextflow run pipeline.nf
  • Assembly / Mash Dist

First build the pipeline script with this command:

flowcraft.py build -r plasmids_assembly -o pipeline

And then execute the pipeline by running nextflow in the script:

nextflow run pipeline.nf
  • Mash Screen

First build the pipeline script with this command:

flowcraft.py build -r plasmids_mash -o pipeline

And then execute the pipeline by running nextflow in the script:

nextflow run pipeline.nf
  • All

This will run all the above pipelines in the same command and generate different outputs for each one of the approaches.

First build the pipeline script with this command:

flowcraft.py build -r plasmids -o pipeline

And then execute the pipeline by running nextflow in the script:

nextflow run pipeline.nf

Import results from FlowCraft

Results will be available within the current working directory in a folder named: results. These files can be uploaded to their respective menus within the pATLAS sidebar menu.

You can also use flowcraft.py report module to generate interactive reports that can send requests to pATLAS directly without importing a file to pATLAS.


Redundancy removal

After loading the files through any of these popup menus and setting the desired cutoffs, a new popup will appear asking if the user wants to use the redundancy option for importing results into the pATLAS matrix.

The rational

This option was created because plasmids are highly chimeric and modular by nature and this renders that results often contains redundant information. Consider the following examples:

  • Two plasmids are highly related (and thus they are linked in pATLAS) and results show that HTS data has a 100% identity with both, but one of them is larger than the other (let's say one has 5kb and another has 50kb). In this case the plasmid with the same % identity but that is larger is the more likely plasmid to be present in our data.

  • HTS data suggest that we may have:

    • one plasmid with 100% identity and sequence length of 5kb.
    • another plasmid with 90% identity and sequence length of 50kb.
    • both plasmids are highly related (and thus they are linked in pATLAS matrix).

In the 2nd case, despite the first plasmid presents a higher identity, the second plasmid presents an overall larger sequence similarity and thus the second plasmid should be the more likely plasmid to be contained in the sequencing data.

Hence, this option was added in order to help dealing with this problem and to make a "guess" of the most likely plasmids instead of reporting all hits from the pipelines described above.

The calculation

All linked plasmids are compared with each other in order to know which one is the best hit from a given group of linked plasmids. If they are not linked, they will not be compared. So, if we have two different groups of plasmids it is likely that HTS data contain two plasmids.

However, each different import types has different calculations to "guess" the best hit for the plasmids within each group, since they are generated by different approaches and pipelines.

Therefore each pair of linked plasmids will be compared as described below for each one of the imports:

  • Mapping

plasmid1 percentage * plasmid1 length - plasmid2 percentage * plasmid2 length

Interpretation: If this result is > 0 it means that the plasmid1 is the "best hit". If this result is < 0 it means that the plasmid2 is a "better hit" than plasmid1. However, if calc results is = 0 this means that both are "best hits".

Note: percentage is the percentage of the queried plasmid that is covered by HTS data, resulting from mapping.

  • Mash screen

plasmid1 identity * plasmid1 length - plasmid2 identity * plasmid2 length

Note: identity is the percentage identity, from the mash screen output, of the queried plasmid and the HTS data.

Interpretation: If this result is > 0 it means that the plasmid1 is the "best hit". If this result is < 0 it means that the plasmid2 is a "better hit" than plasmid1. However, if calc results is = 0 this means that both are "best hits".

  • Assembly

plasmid1 identity * plasmid1 shared hashes * plasmid1 length - plasmid2 identity * plasmid2 shared hashes * plasmid2 length

Note: identity is the percentage identity, from the mash dist output, of the queried plasmid and the HTS data Note 2: shared hashes is a measure of the percentage of sequence that are shared between the HTS data and the plasmid. This is useful because mash dist reports identity of the smallest sequence against the larger sequence.

Interpretation: If this result is > 0 it means that the plasmid1 is the "best hit". If this result is < 0 it means that the plasmid2 is a "better hit" than plasmid1. However, if calc results is = 0 this means that both are "best hits".