Skip to content
This repository was archived by the owner on Nov 18, 2023. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ commands:

install-bazel-linux-rbe:
steps:
- run: curl -OL https://raw.githubusercontent.com/graknlabs/build-tools/master/ci/install-bazel-linux.sh
- run: curl -OL https://raw.githubusercontent.com/graknlabs/build-tools/04c69fbe5277bf2ed9e2baf5e9a53ac3c9ebee80/ci/install-bazel-linux.sh
- run: bash ./install-bazel-linux.sh && rm ./install-bazel-linux.sh
- run: curl -OL https://raw.githubusercontent.com/graknlabs/build-tools/master/ci/install-bazel-rbe.sh
- run: curl -OL https://raw.githubusercontent.com/graknlabs/build-tools/04c69fbe5277bf2ed9e2baf5e9a53ac3c9ebee80/ci/install-bazel-rbe.sh
- run: bash ./install-bazel-rbe.sh && rm ./install-bazel-rbe.sh

run-grakn-server:
Expand Down
47 changes: 29 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@

[Grakn](https://github.com/graknlabs/grakn) lets us create Knowledge Graphs from our data. But what challenges do we encounter where querying alone won’t cut it? What library can address these challenges?

To respond to these scenarios, KGLIB is the centre of all research projects conducted at Grakn Labs. In particular, its focus is on the integration of machine learning with the Grakn knowledge graph.
To respond to these scenarios, KGLIB is the centre of all research projects conducted at Grakn Labs. In particular, its focus is on the integration of machine learning with the Grakn Knowledge Graph. More on this below, in [*Knowledge Graph Tasks*](https://github.com/graknlabs/kglib#knowledge-graph-tasks).

At present this repo contains one project: [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn).
At present this repo contains one project: [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn). Go there for more info on getting started with a working example.

## Quickstart
**Requirements**
Expand All @@ -21,23 +21,38 @@ At present this repo contains one project: [*Knowledge Graph Convolutional Netwo

- The [latest release of Grakn Core](https://github.com/graknlabs/grakn/releases/latest) or [Grakn KGMS](https://dev.grakn.ai/docs/cloud-deployment/kgms) running

**Run**
Take a look at [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) to see a walkthrough of how to use the library.

**Building from source**

To test that all targets can be built:
Clone KGLIB:

```
git clone git@github.com:graknlabs/kglib.git
```

`cd` in to the project:

```
cd kglib
```

To build all targets can be built:

```bash
```
bazel build //...
```

To run all tests:
To run all tests (requires Python 3.6+):

```bash
bazel test //... --test_output=streamed --spawn_strategy=standalone --python_version PY3 --python_path $(which python3)
```
bazel test //kglib/... --test_output=streamed --spawn_strategy=standalone --python_version PY3 --python_path $(which python3)
```

To build the pip distribution (find the output in `bazel-bin`):

```bash
```
bazel build //:assemble-pip
```

Expand Down Expand Up @@ -76,7 +91,7 @@ Here we term any task which creates new facts for the KG as *Knowledge Graph Com

#### Relation Prediction (a.k.a. Link prediction)

We often want to find new connections in our Knowledge Graphs. Often, we need to understand how two concepts are connected. This is the case of binary Relation prediction, which all existing literature concerns itself with. Grakn is a [Hypergraph](https://en.wikipedia.org/wiki/Hypergraph), where Relations are [Hyperedges](https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms#hyperedge). Therefore, in general, the Relations we may want to predict may be **ternary** (3-way) or even **[N-ary](https://en.wikipedia.org/wiki/N-ary_group)** (N-way), which goes beyond the research we have seen in this domain.
We often want to find new connections in our Knowledge Graphs. Often, we need to understand how two concepts are connected. This is the case of **binary** Relation prediction, which all existing literature concerns itself with. Grakn is a [Hypergraph](https://en.wikipedia.org/wiki/Hypergraph), where Relations are [Hyperedges](https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms#hyperedge). Therefore, in general, the Relations we may want to predict may be **ternary** (3-way) or even **[N-ary](https://en.wikipedia.org/wiki/N-ary_group)** (N-way), which goes beyond the research we have seen in this domain.

When predicting Relations, there are several scenarios we may have. When predicting binary Relations between the members of one set and the members of another set, we may need to predict them as:

Expand All @@ -88,21 +103,17 @@ When predicting Relations, there are several scenarios we may have. When predict

*Examples:* The problem of predicting which disease(s) a patient has is a one-to-many problem. Whereas, predicting which drugs in the KG treat which diseases is a many-to-many problem.

We anticipate that solutions working well for the one-to-one case will also be applicable (at least to some extent) to the one-to-many case and cascade also to the many-to-many case.

***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can help us with one-to-one binary Relation prediction. This requires extra implementation, for which two approaches are apparent:

- Create two KGCNs, one for each of the two Roleplayers in the binary Relation. Extend the neural network to compare the embeddings of each Roleplayer, and classify the pairing according to whether a Relation should exist or not.
Notice also that recommender systems are one use case of one-to-many binary Relation prediction.

- Feed Relations directly to a KGCN, and classify their existence. (KGCNs can accept Relations as the Things of interest just as well as Entities). To do this we also need to create hypothetical Relations, labelled as negative examples, and feed them to the KGCN alongside the positively labelled known Relations. Note that this extends well to ternary and N-ary Relations.
We anticipate that solutions working well for the one-to-one case will also be applicable (at least to some extent) to the one-to-many case and cascade also to the many-to-many case.

Notice also that recommender systems are one use case of one-to-many binary Relation prediction.
***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) performs Relation prediction using an approach based on [Graph Networks](https://github.com/deepmind/graph_nets) from DeepMind. This can be used to predict **binary**, **ternary**, or **N-ary** relations. This is well-supported for the one-to-one case and the one-to-many case.

#### Attribute Prediction

We would like to predict one or more Attributes of a Thing, which may include also prediction of whether that Attribute should even be present at all.

***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to directly learn Attributes for any Thing. Attribute prediction is already fully supported.
***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to directly learn Attributes for any Thing. This requires some minor additional functionality to be added (we intend to build this imminently).

#### Subgraph Prediction

Expand All @@ -114,7 +125,7 @@ Embeddings of Things and/or Types are universally useful for performing other do
These vectors are easy to ingest into other ML pipelines.
The benefit of building general-purpose embeddings is therefore to make use of them in multiple other pipelines. This reduces the expense of traversing the Knowledge Graph, since this task can be performed once and the output re-used more than once.

***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to build general-purpose embeddings. This requires additional functionality, since a generic loss function is required in order to train the model. At its simplest, this can be achieved by measuring the shortest distance across the KG between two Things. This can be achieved trivially in Grakn using [`compute path`](https://dev.grakn.ai/docs/query/compute-query#compute-the-shortest-path).
***In KGLIB*** [*Knowledge Graph Convolutional Networks* (KGCNs)](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn) can be used to build general-purpose embeddings. This requires additional functionality, since a generic loss function is required in order to train the model in an unsupervised fashion. At its simplest, this can be achieved by measuring the shortest distance across the KG between two Things. This can be achieved trivially in Grakn using [`compute path`](https://dev.grakn.ai/docs/query/compute-query#compute-the-shortest-path).

#### Rule Mining (a.k.a. Association Rule Learning)

Expand Down
4 changes: 2 additions & 2 deletions dependencies/graknlabs/dependencies.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ def graknlabs_build_tools():
git_repository(
name = "graknlabs_build_tools",
remote = "https://github.com/graknlabs/build-tools",
commit = "f50e7a618045c99862bed78f813b1cfbb25a6016", # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_build_tools
commit = "04c69fbe5277bf2ed9e2baf5e9a53ac3c9ebee80", # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_build_tools
)


Expand All @@ -19,5 +19,5 @@ def graknlabs_client_python():
git_repository(
name = "graknlabs_client_python",
remote = "https://github.com/graknlabs/client-python",
commit = "4f03fc79fba71f216a28a4bc412c084fcef099a0" # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_client_python
tag = "1.5.4" # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_client_python
)
Binary file added kglib/kgcn/.images/graph_snippet.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added kglib/kgcn/.images/learning.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
103 changes: 100 additions & 3 deletions kglib/kgcn/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@


# Knowledge Graph Convolutional Networks

This project introduces a novel model: the *Knowledge Graph Convolutional Network* (KGCN). This project is in its second major iteration since its inception.
Expand Down Expand Up @@ -31,10 +33,105 @@ See the [full example](https://github.com/graknlabs/kglib/tree/master/kglib/kgcn
Once you have installed kglib via pip (as above) you can run the example as follows:

1. Start a Grakn server

2. Load [the schema](kglib/utils/grakn/synthetic/examples/diagnosis/schema.gql) for the example into Grakn. The template for the command is `./grakn console -k diagnosis -f path/to/schema.gql`

3. Run the example: `python -m kglib.kgcn.examples.diagnosis.diagnosis`

4. You should observe console output to indicate that the pipeline is running and that the model is learning. Afterwards two plots should be created to visualise the training process and examples of the predictions made.

## Output

### Console

During training, the console will output metrics for the performance on the training and test sets.

You should see output such as this for the diagnosis example:
```
# (iteration number), T (elapsed seconds), Ltr (training loss), Lge (test/generalization loss), Ctr (training fraction nodes/edges labeled correctly), Str (training fraction examples solved correctly), Cge (test/generalization fraction nodes/edges labeled correctly), Sge (test/generalization fraction examples solved correctly)
# 00000, T 8.7, Ltr 2.4677, Lge 2.3044, Ctr 0.2749, Str 0.0000, Cge 0.2444, Sge 0.0000
# 00050, T 11.3, Ltr 0.5098, Lge 0.4571, Ctr 0.8924, Str 0.0000, Cge 0.8983, Sge 0.0000
# 00100, T 14.0, Ltr 0.3694, Lge 0.3340, Ctr 0.8924, Str 0.0000, Cge 0.8983, Sge 0.0000
# 00150, T 16.6, Ltr 0.3309, Lge 0.3041, Ctr 0.9010, Str 0.0000, Cge 0.8919, Sge 0.0000
# 00200, T 19.2, Ltr 0.3125, Lge 0.2940, Ctr 0.9010, Str 0.0000, Cge 0.8919, Sge 0.0000
# 00250, T 21.8, Ltr 0.2975, Lge 0.2790, Ctr 0.9254, Str 0.2000, Cge 0.9178, Sge 0.4333
# 00300, T 24.4, Ltr 0.2761, Lge 0.2641, Ctr 0.9332, Str 0.6000, Cge 0.9243, Sge 0.4333
# 00350, T 27.0, Ltr 0.2653, Lge 0.2534, Ctr 0.9340, Str 0.6000, Cge 0.9243, Sge 0.4333
# 00400, T 29.7, Ltr 0.2866, Lge 0.2709, Ctr 0.9332, Str 0.6000, Cge 0.9178, Sge 0.0000
# 00450, T 32.3, Ltr 0.2641, Lge 0.2609, Ctr 0.9324, Str 0.6000, Cge 0.9178, Sge 0.4333
# 00500, T 34.9, Ltr 0.2601, Lge 0.2544, Ctr 0.9324, Str 0.6000, Cge 0.9178, Sge 0.4333
# 00550, T 37.5, Ltr 0.2571, Lge 0.2501, Ctr 0.9332, Str 0.6000, Cge 0.9243, Sge 0.4333
# 00600, T 40.1, Ltr 0.2530, Lge 0.2404, Ctr 0.9348, Str 0.6000, Cge 0.9373, Sge 0.4333
# 00650, T 42.7, Ltr 0.2508, Lge 0.2363, Ctr 0.9356, Str 0.6000, Cge 0.9438, Sge 0.4333
# 00700, T 45.3, Ltr 0.2500, Lge 0.2340, Ctr 0.9372, Str 0.7333, Cge 0.9503, Sge 0.4333
# 00750, T 48.0, Ltr 0.2493, Lge 0.2307, Ctr 0.9372, Str 0.7333, Cge 0.9567, Sge 0.8000
# 00800, T 50.7, Ltr 0.2488, Lge 0.2284, Ctr 0.9372, Str 0.7333, Cge 0.9567, Sge 0.8000
```

Take note of the key:

- \# - iteration number
- T - elapsed seconds
- Ltr - training loss
- Lge - test/generalization loss
- Ctr - training fraction nodes/edges labeled correctly
- Str - training fraction examples solved correctly
- Cge - test/generalization fraction nodes/edges labeled correctly
- Sge - test/generalization fraction examples solved correctly

The element we are most interested in is `Sge`, the proportion of subgraphs where all elements of the subgraph were classified correctly. This therefore represents an entirely correctly predicted example.

### Diagrams

#### Training Metrics
Upon running the example you will also get plots from matplotlib saved to your working directory.

You will see plots of metrics for the training process (training iteration on the x-axis) for the training set (solid line), and test set (dotted line). From left to right:

- The absolute loss across all of the elements in the dataset
- The fraction of all graph elements predicted correctly across the dataset
- The fraction of completely solved examples (subgraphs extracted from Grakn)

![learning metrics](.images/learning.png)

#### Visualise Predictions

We also receive a plot of some of the predictions made on the test set.

**Blue box:** Ground Truth

- Preexisting (known) graph elements are shown in blue

- Relations and role edges that **should be predicted to exist** are shown in green

- Candidate relations and role edges that **should not be predicted to exist** are shown faintly in red

**Black boxes**: Model Predictions at certain message-passing steps

This uses the same colour scheme as above, but opacity indicates a probability given by the model.

The learner predicts three classes for each graph element. These are:

```
[
Element already existed in the graph (we wish to ignore these elements),
Element does not exist in the graph,
Element does exist in the graph
]
```

In this way we perform relation prediction by proposing negative candidate relations (Grakn's rules help us with this). Then we train the learner to classify these negative candidates as **does not exist** and the correct relations as **does exist**.

These boxes shows the score assigned to the class **does exist**.

Therefore, for good predictions we want to see no blue elements, and for the red elements to fade out as more messages are passed, the green elements becoming more certain.



![predictions made on test set](.images/graph_snippet.png)

This visualisation has some flaws, and will be improved in the future.

## Methodology

The methodology that this implementation uses for Relation prediction is as follows:
Expand All @@ -43,7 +140,7 @@ In the case of the diagnosis example, we aim to predict `diagnosis` Relations. W

We then teach the KGCN to distinguish between the positive and negative examples.

###Examples == Subgraphs
### Examples == Subgraphs

We do this by creating *examples*, where each example is a subgraph extracted from a Grakn knowledge Graph. These subgraphs contain positive and negative instances of the relation to be predicted.

Expand Down Expand Up @@ -74,11 +171,11 @@ A single subgraph is extracted from Grakn by making these queries and combining

We can visualise such a subgraph by running these two queries in Grakn Workbase:

![](.images/queried_subgraph.png)
![queried subgraph](.images/queried_subgraph.png)

You can get the relevant version of Workbase from the Assets of the [latest release](https://github.com/graknlabs/workbase/releases/latest).

###Learning
### Learning

A KGCN is a learned message-passing graph algorithm. Neural network components are learned, and are used to transform signals that are passed around the graph. This approach is convolutional due to the fact that the same transformation is applied to all edges and another is applied to all nodes. It may help your understanding to analogise this to convolution over images, where the same transformation is applied over all pixel neighbourhoods.

Expand Down