# Resistome profiling

One of the main concerns during the suspected outbreak of *E.cloacae* was the possibility of antimicrobial resistance (AMR), particularly carbapenem resistance.

For this step in the workflow, we'll be sketching with:

* MinHash

***

## Quick look

To quickly tell if there are any AMR genes in these samples, we can use [GROOT](https://github.com/will-rowe/groot). GROOT is designed for metagenomes but does the job for isolates too. It works by building variation graphs for clusters of genes, then indexing each graph traversal using MinHash sketches.

Because we just want a quick look, we will use the raw sequence data from our samples. This is to demonstrate that sketching is quite useful when you want an answer quickly. GROOT also has read QC and trimming built in, so bad reads will be handled.

* download a reference AMR database and index it:

In [10]:
# download the ResFinder database
!groot get -d resfinder

# index the database
!groot index -i ./resfinder.90 -l 150 -o resfinder-index

downloading the pre-clustered resfinder database...
unpacking...
database saved to: ./resfinder.90
now run `groot index -i ./resfinder.90` or `groot index --help` for full options


> -l specifies the window length to sketch in the graph, which should be similar to the read length

* align the reads to the reference graphs

In [11]:
# align the reads
!gunzip -c ../data/reads/ERX168346_*.fastq.gz | groot align -i resfinder-index --trim -q 20 -o ERX168346-graphs > ERX168346.bam

> the align subcommand produces a BAM file containing all graph traversals for each read. Each BAM file essentially contains the ARG-derived reads.

> the gfa variation graphs which had reads align are also kept and can be viewed in Bandage etc.

* now, report what genes are present:

In [12]:
!groot report -i ERX168346.bam --lowCov

sul2_2_GQ421466	55	816	813M3D
sul1_2_CP002151	132	927	904M23D
blaSHV-12_1_AF462395	42	861	4D845M12D
sul2_10_AM183225	55	819	816M3D
dfrA1_1_X00926	153	474	10D463M1D
sul2_14_AJ514834	53	819	803M16D
blaTEM-1A_4_HM749966	89	861	4D851M6D
aadA1_4_JQ480156	262	789	6D777M6D


> the --lowCov flag is used as we are using GROOT on isolates. The flag ignores uncovered reads in the first few bases of each gene, which are usually uncovered because there are not enough reads to completely span the gene (partial read alignments aren't counted by GROOT).