-
Notifications
You must be signed in to change notification settings - Fork 3
Step 1. Creating a Reference Database with makestructuraldb
In the first part of the tutorial, we will generate a protein structural database related to two human transcripts with Ensembl IDs ENST00000367182 and ENST00000374005.
To build a BLAST protein database, follow these steps:
-
Retrieve the human proteome in FASTA format. You can use a public repository such as UniProt or Ensembl.
-
Generate the BLAST protein database using the following command in the terminal:
makeblastdb -in UP000005640_9606.fasta.gz -dbtype prot -out human_proteome_uniprot
This command will generate three files with the name "human_proteome_uniprot" and extensions .phr, .pin, and .psq. You can find them in the human_proteome_blastdb folder.
If you want to retrieve structural data for proteins that currently do not have structures in PDB, you can download all the files in PDB and rely on sequence homology to find structural homologs. To reproduce our example, we downloaded PDBs and AlphaFold2 models related to the transcripts of interest. You can find these files in the pdbs folder.
To execute makestructuraldb, follow these steps:
- In Terminal, cd to the 1-makestructuraldb directory.
- Execute the following command:
makestructuraldb --pdb input_pdbs.txt --blast_db human_proteome_blastdb/human_proteome_uniprot --pident 95
The inputs are a list of PDB files and the human_proteome BLAST database. Additionally, a threshold of 95% sequence percent identity was set for filtering BLAST hits to reduce the number of results.
The corresponding output can be found in the structural_db folder. For the selected input PDBs, structural information was found for 11 proteins (see structuralDB).