Skip to content

Step 1. Creating a Reference Database with makestructuraldb

Victoria edited this page Apr 19, 2023 · 1 revision

In the first part of the tutorial, we will generate a protein structural database related to two human transcripts with Ensembl IDs ENST00000367182 and ENST00000374005.

Build a BLAST Protein Database

To build a BLAST protein database, follow these steps:

  1. Retrieve the human proteome in FASTA format. You can use a public repository such as UniProt or Ensembl.

  2. Generate the BLAST protein database using the following command in the terminal:

makeblastdb -in UP000005640_9606.fasta.gz -dbtype prot -out human_proteome_uniprot

This command will generate three files with the name "human_proteome_uniprot" and extensions .phr, .pin, and .psq. You can find them in the human_proteome_blastdb folder.

Download PDB files

If you want to retrieve structural data for proteins that currently do not have structures in PDB, you can download all the files in PDB and rely on sequence homology to find structural homologs. To reproduce our example, we downloaded PDBs and AlphaFold2 models related to the transcripts of interest. You can find these files in the pdbs folder.

Execute makestructuraldb

To execute makestructuraldb, follow these steps:

  1. In Terminal, cd to the 1-makestructuraldb directory.
  2. Execute the following command:
makestructuraldb --pdb input_pdbs.txt --blast_db human_proteome_blastdb/human_proteome_uniprot --pident 95

The inputs are a list of PDB files and the human_proteome BLAST database. Additionally, a threshold of 95% sequence percent identity was set for filtering BLAST hits to reduce the number of results.

Output

The corresponding output can be found in the structural_db folder. For the selected input PDBs, structural information was found for 11 proteins (see structuralDB).