Martin Linkov edited this page Aug 24, 2018 · 61 revisions

We use the term Schema to refer to the way Unigraph's data is laid out or structured. The latest Unigraph schema is periodically made available for download here. The schema is available in two formats:

Also, in Unigraph schema and data are combined, and all classes and properties can be browsed and queried as any other node through the playground.

The .json export has the following structure:

Property struct {
		NodeData
		ObjectClasses       []NodeData `json:",omitempty"`
		ExpectedClasses     []NodeData `json:",omitempty"`
		SubjectClasses      []NodeData `json:",omitempty"`
		SubPropertyOf       []NodeData `json:",omitempty"`
		MandatoryQualifiers []NodeData `json:",omitempty"`
		Unique              bool
		Distinct            bool
		Symetric            bool
		SubDomainAgnostic   bool `json:",omitempty"`
		Qualifier           bool
		IDFormat            string
		Measures            *NodeData `json:",omitempty"`
		Decode              []string  `json:",omitempty"`
		Encode              []string  `json:",omitempty"`
		Authority           *NodeData `json:",omitempty"`
	}
	Class struct {
		NodeData
		InstanceOf []NodeData `json:",omitempty"`
		SubclassOf []NodeData `json:",omitempty"`
	}
	NodeData struct {
		UID             quadstore.Term
		HRID            string                          `json:",omitempty"`
		Label           string                          `json:",omitempty"`
		Description     string                          `json:",omitempty"`
		OfficialWebsite string                          `json:",omitempty"`
		Identifiers     map[ug.Identifier]ug.Identifier `json:",omitempty"`
	}

Jump to:
Properties
Nodes
Identifiers
Distinct & Unique
Qualifiers
Symmetric
SubProperty

Properties

Properties define a HAS A relationship between a node and the value of the property: e.g. Paris {node} has a population {property} of 2229621 {value}. The value of a property can be another node. e.g. Apocalypse Now {node} has a director {property} Francis Ford Coppola {value} or any other datatype, as described in greater detail here

Lets look at a property:

{
	"UID": "0120f818",
	"HRID": "academic_degree",
	"Label": "academic degree",
	"Description": "academic degree that the person holds",
	"Identifiers": {
		"freebase_id": "m.04qk",
		"wikidata_id": "P512"
	},
	"ObjectClasses": [
		{
			"UID": "12fa02e45d",
			"Label": "academic degree",
			"Description": "college or university diploma",
			"Identifiers": {
				"wikidata_id": "Q189533"
			}
		}
	],
	"ExpectedClasses": [
		{
			"UID": "1017",
			"HRID": "unigraph_node",
			"Label": "Unigraph Node",
			"Description": "Graph element to which claims are assigned"
		}
	],
	"SubjectClasses": [
		{
			"UID": "1205",
			"Label": "human",
			"Description": "common name of Homo sapiens, unique extant species of the genus Homo",
			"Identifiers": {
				"wikidata_id": "Q5"
			}
		}
	],
	"Unique": false,
	"Distinct": false,
	"Symetric": false,
	"Qualifier": false,
	"IDFormat": ""
}

Nodes

Properties are represented as combination of objects (nodes). Each node has these basic parameters:

  • UID - the node's unique id in Unigraph: 0120f818
  • HRID - the node's Human Readable ID used to query the information in the graph
  • Label - the node's label (name)
  • Description - the node's description
  • Identifiers - list of ids on external data repositories such as wikidata and freebase that this node is aligned with P512

ObjectClasses, ExpectedClasses and SubjectClasses follow the same structure. They provide information on what classes the subjects and the objects of the property will receive (if any). In the above example both the subject and object class are other nodes in the graph. ExpectedClasses are used only on the object class of the property and denote which of the Unigraph classes is expected for that direction of the relationship. Primitives, such as text, numbers, quantities are held directly in their respective object classes. When quantity is used, the measuring unit is provided like so:

"Measures": {
		"UID": "12fa1e315e",
		"Label": "unit of length",
		"Description": "way of measuring length or distance",
		"Identifiers": {
			"wikidata_id": "Q1978718"
		}
	}

Identifiers

A significant amount of properties help us identify and reconcile data across datasets. There properties are "Identifiers" and in they bear additional information such as:

  • IDFormat - The RegEx expression (if any) used to grab the content at the specified resource. When no ReGex expression is provided the entire string at the Encode location is extracted.
  • Decode - Decode url (if any and if different from the encode) used to reconstruct the page about the subject in the format: url/$1 with $1 specifying the position where the string has to be placed
  • Encode - The url to be used to extract the identifying string in the format url/$1 with $1 specifying the position where the string is to be extracted from based on the rule set in IDFormat
  • Authority - information about the issuing authority of the identifier. Contains its UID, Label, Description, Official Website and Identifiers on other data repositories. Here is an example:
  • SubDomainAgnostic Very few properties are subdomain agnostic. The identifiers of LinkedIn, for example are, so that we can work around the different country subdomains LinkedIn uses.
	"IDFormat": "[\\p{Ll}-\\d]+",
	"Decode": [
		"https://www.crunchbase.com/company/$1",
		"https://www.crunchbase.com/organization/$1"
	],
	"Encode": [
		"https://www.crunchbase.com/organization/$1"
	],
	"Authority": {
		"UID": "1195",
		"Label": "CrunchBase",
		"Description": "database of companies and start-ups, operated by TechCrunch",
		"OfficialWebsite": "http://www.crunchbase.com/",
		"Identifiers": {
			"wikidata_id": "Q10846831"
		}

Distinct & Unique

  • Unique - each subject can have only one such unique claim. For example, throughout its life a company can have just a single IRS employer number. Also, at any given period of time, a country will have a single capital.
  • Distinct - the value of the property is to be used to uniquely identify a subject in Unigraph, meaning that there can not be two subjects with the same distinct value. Identifiers are distinct. There can not be two companies with the same IRS employer number.

Qualifiers

  • Qualifiers - Some properties can only be used as qualifiers, that is they can only clarify and provide additional information to a claim. Examples include: website_username. We discuss this in greater detail in the section for Claims and Qualifiers

  • Mandatory Qualifiers - A few properties require a qualifier to be stored. For example: catalog_code can not be stored without the qualifier catalog - it does not make sense to provide a catalog reference, without specifying the catalog.

Symmetric

  • Symmetric sets the same class to its subject and object. There are very few such properties: spouse, for example.

SubProperty

The SubProperty from the example follows the same structure of a property and is used to express a more general claim. For example: headquarters_location is a sub property of location, so when a relationship is modeled with "headquarters location", a new claim will be automatically generated between the subject and object with the "location" property.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.