The INDRA (Integrated Network and Dynamical Reasoning Assembler) Database is a framework for creating, maintaining, and accessing a database of content, readings, and statements. This implementation is currently designed to work primarily with Amazon Web Services RDS running Postrgres 9+. Used as a backend to INDRA, the INDRA Database provides a systematic way of scaling the knowledge acquired from other databases, reading, and manual input, and puts that knowledge at your fingertips through a direct pyton client and a REST api.
The INDRA Database currently integrates the following natural language processing systems at scale:
with content drawn from:
- PubMed - ~28 million abstracts
- PubMed Central - ~5 million fulltext
- Elsevier - ~0.7 million fulltext (requires special access)
We also collect information from these databases:
These databases are retrieved using the tools in indra.sources
. The statements
extracted from all of these sources are stored and updated in the database.
The INDRA Database uses the powerful internal assembly tools available in INDRA but implemented for large-scale incremental assembly. The resulting corpus of cleaned and de-duplicated statements, each with fully maintained provenance, is the primary product of the database.
For more details on the internal assembly process of INDRA, see the INDRA documentation.
The content in the database can be accessed by those that created it using the
indra_db.client
submodule. This repo also implements a REST API which can be
used by those without direct acccess to the database. For access to our REST
API, please contact the authors.
The INDRA database only works for Python 3 (tested in 3.5 and 3.6).
First, install INDRA,
then simply clone this repo, and make sure that it is visible in your
PYTHONPATH
.