# pypdb demos

This is a set of basic examples of the usage and outputs of the various individual functions included in. There are generally three types of functions:

+ Functions that perform searches and return lists of PDB IDs
+ Functions that get information about specific PDB IDs

The list of supported search types, as well as the different types of information that can be returned for a given PDB ID, is large (and growing) and is enumerated in the docstrings of pypdb.py. The PDB allows a very wide range of different types of queries, and so any option that is not currently available can likely be implemented based on the structure of the query types that have already been implemented. Please submit feedback and pull requests on GitHub.

### Preamble

In [1]:
%pylab inline
from IPython.display import HTML

## Import from local directory
import sys
sys.path.insert(0, '../pypdb')
from pypdb import *

## Import from installed package
# from pypdb import *

%load_ext autoreload
%autoreload 2

Populating the interactive namespace from numpy and matplotlib


# Search functions that return lists of PDB IDs

#### Get a list of PDBs for a specific search term

In [2]:
found_pdbs = Query("actin network").search()
print(found_pdbs[:10])

['4HMY', '6E5N', '1D7M', '1WAA', '5LPN', '4DCN', '4ETO', '6CM9', '6CRI', '6D83']


#### Search by PubMed ID Number

In [3]:
found_pdbs = Query(27499440, "PubmedIdQuery").search()
print(found_pdbs[:10])

['5IMT', '5IMW', '5IMY']


#### Search by source organism using NCBI TaxId

In [68]:
found_pdbs = Query('6239', 'TreeEntityQuery').search() #TaxID for C elegans
print(found_pdbs[:5])

['4AG7', '4AG9', '2F86', '1SEM', '1SKN']


#### Search by a specific experimental method

In [4]:
found_pdbs = Query('SOLID-STATE NMR', query_type='ExpTypeQuery').search()
print(found_pdbs[:10])

['5LCB', '2RLZ', '6WAP', '2MC7', '2MCX', '2MCW', '2MCV', '2MCU', '2MEX', '2MJZ']


#### Search by protein structure similarity

In [81]:
found_pdbs = Query('2E8D', query_type="structure").search()
print(found_pdbs[:10])

['2E8D', '4OBA', '4OGV', '4JVR', '3LBL', '4QO4', '4JWR', '2WS4', '4ERE', '2CEU']


#### Search by Author

In [80]:
found_pdbs = Query('Perutz, M.F.', query_type='AdvancedAuthorQuery').search()
print(found_pdbs)

['2HHB', '3HHB', '4HHB', '1PBX', '2MHB', '1CQ4', '1HDA', '1FDH', '1GDJ', '2GDM', '2DHB']


#### Search by organism

In [6]:
q = Query("Dictyostelium", query_type="OrganismQuery")
print(q.search()[:10])

['2H84', '4AE3', '3MNQ', '5AN9', '6QKL', '2W94', '2W95', '2WN2', '2WN3', '4AKR']


# Information Search functions
While the basic functions described in the previous section are useful for looking up and manipulating individual unique entries, these functions are intended to be more user-facing: they take search keywords and return lists of authors or dates

#### Find papers for a given keyword

In [4]:
matching_papers = find_papers('crispr', max_results=10)
print(list(matching_papers)[:10])

['Structures of the Cmr-beta Complex Reveal the Regulation of the Immunity Mechanism of Type III-B CRISPR-Cas.', 'Structures of the Cmr-beta Complex Reveal the Regulation of the Immunity Mechanism of Type III-B CRISPR-Cas', 'Cas1-Cas2 complex formation mediates spacer acquisition during CRISPR-Cas adaptive immunity.', 'An RNA-Induced Conformational Change Required for Crispr RNA Cleavage by the Endoribonuclease Cse3.', 'Structural plasticity and in vivo activity of Cas1 from the type I-F CRISPR-Cas system.']


# Functions that return information about single PDB IDs

#### Get the full PDB file

In [9]:
pdb_file = get_pdb_file('4lza', filetype='cif', compression=False)
print(pdb_file[:400])

data_4LZA
# 
_entry.id   4LZA 
# 
_audit_conform.dict_name       mmcif_pdbx.dic 
_audit_conform.dict_version    5.281 
_audit_conform.dict_location   http://mmcif.pdb.org/dictionaries/ascii/mmcif_pdbx.dic 
# 
loop_
_database_2.database_id 
_database_2.database_code 
PDB   4LZA         
RCSB  RCSB081269   
WWPDB D_1000081269 
# 
_pdbx_database_related.db_name        TargetTrack 
_pdbx_database_rela


#### Get a general description of the entry's metadata

In [11]:
all_info = get_info('4LZA')
print(list(all_info.keys()))

['audit_author', 'cell', 'citation', 'diffrn', 'diffrn_detector', 'diffrn_radiation', 'diffrn_source', 'entry', 'exptl', 'exptl_crystal', 'exptl_crystal_grow', 'pdbx_sgproject', 'pdbx_audit_revision_details', 'pdbx_audit_revision_history', 'pdbx_database_related', 'pdbx_database_status', 'pdbx_vrpt_summary', 'rcsb_accession_info', 'rcsb_entry_container_identifiers', 'rcsb_entry_info', 'rcsb_primary_citation', 'refine', 'refine_hist', 'refine_ls_restr', 'reflns', 'reflns_shell', 'software', 'struct', 'struct_keywords', 'symmetry', 'rcsb_id']


#### Run a Sequence search

Formerly using BLAST, this method now uses MMseqs2

In [20]:
q = Query("VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTAVAHVDDMPNAL", 
          query_type="sequence", 
          return_type="polymer_entity")

print(q.search())

{'query_id': '90344865-6c1d-448f-ba5b-64a2d727ade6', 'result_type': 'polymer_entity', 'total_count': 782, 'explain_meta_data': {'total_timing': 71, 'sort_timing': 0, 'terminal_node_timings': {'6469': 70}}, 'result_set': [{'identifier': '1C7D_1', 'score': 1.0, 'services': [{'service_type': 'sequence', 'nodes': [{'node_id': 6469, 'original_score': 164.0, 'norm_score': 1.0, 'match_context': [{'sequence_identity': 0.987, 'evalue': 2.053e-47, 'bitscore': 164, 'alignment_length': 80, 'mismatches': 0, 'gaps_opened': 1, 'query_beg': 1, 'query_end': 79, 'subject_beg': 144, 'subject_end': 223, 'query_length': 79, 'subject_length': 284, 'query_aligned_seq': 'VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALT-AVAHVDDMPNAL', 'subject_aligned_seq': 'VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNAL'}]}]}]}, {'identifier': '3OO5_1', 'score': 1.0, 'services': [{'service_type': 'sequence', 'nodes': [{'node_id': 6469, 'original_score': 164.0, 'norm_score':

#### Search by PFAM number

In [128]:
pfam_info = Query("PF00008", query_type="pfam").search()
print(pfam_info[:5])

['4RRW', '4RRX', '4RRY', '4RRZ', '4RUT']
