# Knowledge Graph

This notebook explores the steps involved in construction of a knowledge graph from unstructured text

## Required Toolkits and Packages
### Ubuntu packages
1. sudo apt-get install graphviz
2. python
3. Stanford Core NLP Toolkit
4. Java 1.8 +


### Python Packages
1. pycorenlp - A python wrapper for the Stanford Core NLP Toolkit
2. graphviz - Toolkit for quickly visualising the triple output in graph form


Before running the code, ensure that the Stanford Core NLP Server is up and running. To start the server use the below command (from the Stanford Core NLP Directory)
> java -mx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

## Step 1 - Getting Triples from Unstructured Text

In [1]:
from pycorenlp import StanfordCoreNLP
from graphviz import Source
import pydot

In [2]:
nlp = StanfordCoreNLP(server_url="http://localhost:9000")

In [3]:
filename = "/home/shyam/consciousness.txt"
with open(filename, 'r') as myfile:
    text=myfile.read()

In [4]:
output = nlp.annotate(str.lower(text), properties={'annotators':'tokenize,ssplit,pos,lemma,depparse,natlog,ner,coref,openie',
                                 'outputFormat':'json',"openie.resolve_coref":"true"})

In [5]:
type(output)

dict

Once the output is obtained from the server, we parse the output to extract the open IE section.
For each sentence, we then extract the triple in the order subject, relation and object and aggregate it to a list.
This final list contains the list of all triples extracted from the text

In [6]:
triples_set = [] 
for sentence in output['sentences']:
    for triples in sentence['openie']:
        triple = [triples['subject'],triples['relation'],triples['object']]
        triples_set.append(triple)

In [7]:
triples_set[0:5]

[['consciousness', 'is', 'state'],
 ['consciousness', 'is state of', 'awareness'],
 ['it', 'is', 'have'],
 ['it', 'be', 'consciousness'],
 ['many philosophers', 'believe despite', 'difficulty in definition']]

### Visualise the triples using graphviz
The below section of code is used to generate a graph object for visualisng the generated triples

In [8]:
tmp_folder = "/tmp/openie/"
graph = list()
graph.append('digraph prof {')
graph.append('rankdir = LR;')
graph.append('ratio = fill;')
graph.append('node [style=filled];')

for er in triples_set:
    graph.append('"{}" -> "{}" [ label="{}" ];'.format(er[0], er[2], er[1]))
graph.append('}')
out_dot = tmp_folder + 'out.dot'
with open(out_dot, 'w') as output_file:
    output_file.writelines(graph)


In [9]:
graph_img = Source.from_file(out_dot)

In [10]:
(graph,) = pydot.graph_from_dot_file(out_dot)
graph.write_png('graph_result.png')

In [11]:
graph_img

<graphviz.files.Source at 0x7f23dec4fb00>