Skip to content

Annotations on Structures

Michelle Gill edited this page Apr 12, 2020 · 24 revisions

Mapping sequence data onto structures

The SWISS-MODEL team is currently involved in an EU project to combat COVID-19.

There we are providing protein structures as starting point for further analysis (see here). Given the current outbreak, we would now like to accelerate our plan to map relevant annotations onto those structures. Hence, we are very much interested in tools/platforms which can automatically generate such annotations based on the latest data.

Within this hackathon, the main goals are to (see github-repo for details):

  1. Find/generate relevant sequence data to be displayed on structures
  2. Write reusable scripts to map the sequence data onto the frame of reference of proteins (this might need translation from position on genome data to position on proteins of SARS-CoV-2 as listed here)

Additional topics of interest:

  • Alternative ways to visualize the protein structures
  • For RDF/JSON-LD experts: define an RDF ontology and map our json-data (example) to RDF to be used in other knowledge graph efforts e.g. building on this json-ld context

We will provide an interface to display annotations in a similar fashion as done in our SWISS-MODEL repository (as in this example). Also we will be extending the structural coverage of the SARS-CoV-2 proteome by using protein predictions from colleagues participating in CASP.

Participation

Participants

  • Gerardo Tauriello (on behalf of the SWISS-MODEL team) (coordinator)
  • Tomas Masson (Some Python programming, but I would like to contribute)
  • Sara Vilella (Interested in protein structure. Studying master's degree in Bioinformatics. Some R and Python experience :))
  • Laura Blum (interested undergrad; beginner-intermediate Python)
  • Barbara Terlouw (PhD student in bioinformatics; experience with structural bioinformatics, in particular protein modelling and natural product structure prediction. Advanced in Python programming.)
  • Jan T Kim -- bioinformatician with experience in various areas (including virus bioinformatics) but not a molecular modeller. Have worked on modelling and reconstructing evolution, plant morphogenesis, modelling gene regulatory networks. Computer skills include programming (mainly Python, Java, R), software development / scientific computing (make, git, shell, Linux etc.). My toy project to warm myself up ended up related to the "Annotation of Structures" topic, I think; please comment on Slack channel.
  • Janani Durairaj (PhD student in structural bioinformatics; experience with protein modelling, structure alignment, and machine learning. Advanced in Python programming.)
  • Mehmet Akdel (PhD student in Bioinformatics; experience in algorithm development and protein structure alignment. Advanced in Python programming.)
  • Didier Barradas (PhD in biomedicine, experienced in structural biology specially on protein-protein interactions, I know machine learning, some deep learning, python programmer )
  • Vasilis Promponas (Expertise in protein sequence analysis, sequence repeats, compositional bias, intrinsic disorder, etc)
  • Karel Berka (Structural bioinformatics, specialization in protein structures, molecular modeller)
  • Thomas Lütteke (Structural bioinformatics, glycoinformatics, head of glycosciences.de portal)
  • Michelle Gill (Structural biology, machine learning, proteomics, python)

Context

Protein structure predictions of SARS-CoV-2 have already proven useful to several research projects. To list a few examples which used our models: