Skip to content
Kevin Yang edited this page Jan 14, 2022 · 20 revisions

Introduction

This serves (for now) as the start point for Aletheia, a bio-medical literature semantic discovery service.

The canonical site is https://kongming.app.

Aletheia is the personification of truth (or disclosure) in ancient Greek philosophy and mythology: it's only fitting here as we are providing a discovery service.

Everyone knows and uses keyword-based search service such as PubMed and Google Scholar. However, these tools are not sufficient in the sense that word/token matching only skims the surface of semantic meaning and you will miss some portion of the expected result that your work/inspiration may depend on.

In contrast, Aletheia is a semantic discovery service, in the sense that we model the vast corpus of bio-medical literature as an inter-connected knowledge graph of nodes and relations, where nodes are concepts from UMLS meta-thesaurus and edges are pre-defined predicates (for example, CAUSES, or COEXISTS_WITH, complete list here): essentially we are taking advantage of the collection of biomedical work from the whole research community.

To give you a rough sense of the dataset, we have 32M+ citations (publications), and 110M+ relations in the knowledge graph, as of Aug, 2021. The graph contains 360k+ nodes/concepts, and 45M+ unique edges, each modeling an unique hint of discovery.

Since no one can be fully aware of what's happening in the bio-medical research, even in their own speciality field, a search service can potentially be very useful as it expands the researcher's horizon to the most up-to-date publications. In addition, a discovery service can offer researchers novel insights / inspirations that they were not aware of, or haven't thought of. In terms of priority, the search is (to some extent) provided as a by-product of discovery, which is the most interesting part of this service.

Aletheia tries to achieve search and discovery with an intuitive web interface: the targeted audience will be researchers and doctors from academic institutions, non-profit or for-profit organizations such as hospitals / clinics / pharmaceutical companies, performing drug development, clinical study, biomedical research or healthcare industry.

Quick start

The web interface should be relatively self-explanatory: you enter the query (mostly the concept), click "Go!" and the service will retrieve the result for visual inspection.

UI

Navigation

If you find this page via the "Tutorial" from the navigation bar, well, there is no further explanation, read on.

The navigation bar also allows you to file a bug report.

Algorithm selection

We have 3 algorithms (or guides in this context):

  • spotlight: navigate from a single starting concept, the default tab;
  • scavenger hunt: identify hidden edges/relations between a pair of concepts (the source and destination);
  • leap of confidence: propose / discover novel insights with the addition of a third concept, following similar edges as discovered between source and destination, think this as a process of analogical reasoning.

All query boxes take either free-form text (for example, prozac, alzheimer) or UMLS concept id. Note currently free-form text is still experimental and if you notice anything wrong, don't hesitate to file a bug. Alternatively you can use Metathesaurus Explorer to get the concept ID (for example, C0162373 for prozac) and use UMLS CUI in the query box.

Visual inspection panel

If you hover mouse over any button in the toolbar, a brief tip will pop up.

  • The most useful features are zoom in/out and reset.
  • Be careful about the trash can as it will clear the whole canvas.
  • You can change the aspect ratio (of the canvas) if you have a wide screen.
  • The canvas can be locked. Once locked, the content won't be changed by further actions.

The result set (from this service) will be rendered visually.

The direction of the edge indicates the subject-predicate-object relationship, for example, Prozac CAUSES urticarial vasculitis, or copaxone COEXISTS_WITH Prozac.

If you examine more closely, gray edges (as opposed to blue ones) indicates the relations are marked as trivial, as there are so many mentioning. The novel edges are marked as blue.

Click on the edge (or the predicate along the edge), and the related citations will appear in the right Reference panel.

Click on any concept, and the right Reference panel will contain the binocular button, which allow you to initiate new query based on that concept.

Feel free to drag any concepts or edges, to make the visualization less cluttered.

Reference panel

The Reference panel is located to the right side, which offers related citations / publications.

Related journal and publication dates are provided.

The related subject, object and predicate are marked in bold within the surrounding snippet, and full citations is just a click away from PubMed site. However, currently it's up to the user's own subscription for the full text (of the publication), unless they are freely available.

DISCLAIMER: Data inaccuracies are unavoidable, you may find:

  • the idea from citation doesn't make sense: well, we are not original authors and thus not in place of judgement;
  • occasionally citation has wrong publication and date: data problem outside our control;
  • strange extractions of subject-predicate-object from free-text: underlying NLP algorithm that are outside our control; However, you are still welcome to report them and we will try to determine the best cause of action forward.

Parting words

I sincerely hope you can find value in this service, with whatever you are working on.

Note this service is still in its infancy, in the sense that some of its features are still in development and bugs are still lurking around. I would really appreciate bug reports, feature requests and general comments from motivated users:

  • Submit Bug report or feature request here, note Github account is required;
  • General comments / questions: yangzh@gmail.com.
Clone this wiki locally