Skip to content

wchnicholas/H3N2L194P

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This README describes the scripts used for the sequence analysis in:
A structural explanation for the low effectiveness of the seasonal influenza H3N2 vaccine

H3N2 egg-adaptive substitution L194P

This analysis is adapted from McWhite et al. 2016

Input File

  • Fasta/pdmH1N1_All.fa: 2009 pandemic H1N1 (swine flu) HA sequences downloaded from GISAID
  • Fasta/HumanH3N2_All.fa: Human H3N2 HA sequences downloaded from GISAID
    • Since there is a limit on the number of sequences being downloaded at once on GISAID, sequences for this project was first downloaded separately based on continent. Then sequences from different continent were combined to a single Fasta file.
  • Fasta/Bris07_fromNCBI.fa: 11 Bris07 sequences from the NCBI protein database were obtained by searching "A/Brisbane/10/2007", hemagglutinin.
  • Fasta/HK14_fromGISAID.fa: 8 HK14 sequences from GISAID were obtained by "A/Hong Kong/4801/2014".
  • Fasta/Sing16_fromGISAID.fa: 4 Sing16 sequences from GISAID were obtained by searching "A/Singapore/INFIMH-16-0019/2016".

Protocol

Analyzing HA sequences from egg-passaged isolates downloaded from GISAID

  1. Multiple sequence alignment (MSA) using MAFFT version 7.157b
  • mafft --auto Fasta/pdmH1N1_All.fa > Fasta/pdmH1N1_All.aln
  • mafft --auto Fasta/HumanH3N2_All.fa > Fasta/HumanH3N2_All.aln
  1. Parse MSA files to extract information on egg-passaged isolates
  • python script/ParseGISAIDaln.py:
    • Input files:
      • Fasta/pdmH1N1_All.aln
      • Fasta/HumanH3N2_All.aln
    • Output files:
      • result/HumanH3N2_Pos194YearVsPSG.tsv
      • result/HumanH3N2_EggOri.fa
      • result/HumanH3N2_PSG.tsv
      • result/pdmH1N1_Pos194YearVsPSG.tsv
      • result/pdmH1N1_EggOri.fa
      • result/pdmH1N1_PSG.tsv
  1. Plot the frequency of different amino acids observed at residue 194 in different year
  • Rscript script/Plot_YearVsPSG.R
    • Input files:
      • result/H3N2_Pos194YearVsPSG.tsv
      • result/pdmH1N1_Pos194YearVsPSG.tsv
    • Output files:
      • graph/H3N2_YearVsAA_resi194.png
      • graph/pdmH1N1_YearVsAA_resi194.png
  1. Plot the frequency of L194P against the number of passage in eggs
  • Rscript script/Plot_ProVsPSG.R
    • Input file:
      • result/HumanH3N2_PSG.tsv
    • Output file:
      • graph/HumanH3N2_ProVsPSG.png

Analyzing HA sequences from A/Brisbane/10/2007 (H3N2) deposited in NCBI

  1. Multiple sequence alignment (MSA) using MAFFT version 7.157b
  • mafft --auto Fasta/Bris07_fromNCBI.fa > Fasta/Bris07_fromNCBI.aln
  1. Parse MSA files to extract amino-acid identity on residue 194
  • python script/ParseNCBIseq.py
    • Input file:
      • Fasta/Bris07_fromNCBI.aln
    • Standard Output

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published