Skip to content

Latest commit

 

History

History
62 lines (37 loc) · 2.12 KB

intro.i.md

File metadata and controls

62 lines (37 loc) · 2.12 KB
Introduction

WikiPathways is a biological pathway database and describes the interactions between biochemical entities in biological processes [Q21092742,Q28090976,Q24082733,Q42896569]. It can be downloaded and used in various formats, one of which is the Resource Description Framework (RDF) [Q26261238].

The WikiPathways SPARQL endpoint can be found at http://sparql.wikipathways.org/. SPARQL allows you to query much of the content of the the WikiPathways data in a machine readable way, which has been used, for example, in the Open PHACTS project [Q27061937,Q54404976].

This book discusses how SPARQL can be used to extract information, using numerous example queries, like the following to get metadata about the data loaded into the SPARQL endpoint.

Metadata queries

The following query provides some information about what is currently loaded in the public SPARQL endpoint at http://sparql.wikipathways.org:

metadata

Which gives as output:

metadata

Statistics

The give some idea of the content of the SPARQL endpoint, this section gives some overall statistics.

Number of pathways per species

We can list the number of pathways for each species available in WikiPathways with this query:

pathwayCountBySpecies

It shows us that there is a strong bias towards human pathways:

pathwayCountBySpecies

Number of metabolites per species

Counting metabolites is tricky, as metabolites that are biologically the same (e.g. different charge startes) can have different identifiers. A further complications is that not all metabolites in WikiPathways always have stereochemistry defined, for example because it is biologically obvious, as for amino acids. But we can count the number of Wikidata identifiers to get a reasonable estimate:

metaboliteCountBySpecies

This tells us:

metaboliteCountBySpecies

References