Some theory:
Assembler:
- Number of contigs (we care only about large contigs)
- Lenght of contigs ~ coverage
- Critera: N=50, GC content, ref length ( [https://en.wikipedia.org/wiki/Sequence_alignment] [http://topicpageswiki.plos.org/wiki/K-mer] )
- If there a extremely different contig -> it is a plasmid
- How can we fix errors: k-mers distribution (trimmomatic) or use many k-mers`
Homework:
- How many k-mers? (see k-mers distrivution)
- Why allignment does not work? (assemlly, find relatives, find differences, how thew evolved, goe they became pathogens?)
we will use SPADes, BLAST, annotate it
Tacks: What is the genome sequence of E.coli X? What strain of E.coli is E.coli X most similar to? (Where did it come from?) What are the genes that E.coli X contains? Which of these genes make E.coli X distinct? How did E.coli X evolve to obtain these genes? How did E.coli X become pathogenic?