# Rule Learning on KB Enriched with Embedding-Based and Text-Based Similarity Function
**By:** Vinay Chitepu <br/>
**Advisors**: Daisy Wang, Ali Sadeghian  <br/>
**Course:** CAP4773-Projects in Data Science <br/>
**UF Data Science Research Lab**

## Introduction
Background, project proposal, and summaries of papers


### Background

Knowledge Bases (KBs) are structures that hold a large variety of information about different entities and facts that determine relationships between these entities. While they are large they are not complete i.e. they don’t have all the relationships that are possible (they don’t have all the rules/facts that can be determined). This will focus on RDF knowledge bases which are KBs such as Freebase, YAGO, and DBPedia.

Knowledge Embedding has been used as a way to extract features using a low-dimensional continuous space. It can be used to derive unknown faces from KBs and see if the existing triples in the KBs are also correct. 

Rule Mining is the process of analyzing mined data for patterns and occurrences and finds frequent associations between different entities and previous rules to establish new rules.


### Objectives

Explore enriching KBs with additional links obtained from embedding based methods.
- Look at different embedding methods to determine which to be the best to use in for this purpose.
- Research rule mining systems (AIME paper) to determine which system(s) to use for this project

Analyze the effect of these links on quality and expressiveness of rules.
- Test embeddings and rule minings on smaller dataset to get some practice.
- Test embeddings on FB15K-237 and YAGO.





### Problem Statement

Seeing if rule learning on a KB enriched with embedding based links, such as similar_to or related_to, help with the accuracy of the rules. Today, there exist rule mining systems and software but adding these similarity based links may increase the accuracy of these links.


### Related Work

There are many embedding methods currently in use, tensor based as well as many different rule mining systems in use. Some embedding methods are Structure Embedding, Neural Tensor Networks, Translation-Based, and Bilinear-Diag Model. Some Rule Mining methods include Associative Rule Mining and Logical Rule Mining. 


### Proposed Methods

Use an embedding method such as DistMult to create similarity-based links in the knowledge base and then use a rule mining to determine an ontology for the KB. Test to see if the accuracy of the rules is better than just using only the rule mining method in place today. Using the OpenKE package for the Knowledge Embedding that has implementations of the embedding methods. <br><br>

Another method is to enrich the knowledge base useing text-based embeddings from the English-Wiki using Word2Vec embeddings and cosine similarity


### Evaluation Datasets
- Freebase (FB15K)
- YAGO2s

### Benchmark
Current rule mining software works pretty well but adding the embedding based links might make their accuracy better.

## OpenKE for Embedding
I will use the OpenKE (Open-Source Package for Knowledge Embedding) to implement the embedding models in training and testing

#### Installation for OpenKE

```
$ git clone https://github.com/thunlp/OpenKE 
$ cd OpenKE 
$ bash make.sh
```

#### Installation for Tensorflow

```
$ pip3 install --upgrade tensorflow
```

### Formatting Data
Formatted files are location in directory as this IPython Notebook for reerence

#### Training 
3 Files
- **entity2id.txt:** Entity file; the first line is the number of entities. The following lines contain the entities id followed by unique integer to represent it. 
- **relation2id.txt:** Relation file; the first line represents the number of relationships. The following lines contain the full relationships followed by a unique interger to represent it.

-  **train2id.txt:** Training file; the first line is the number of triples for training. Then the following lines are all in the format (e1, e2, rel) which indicates there is a relation rel between e1 and e2. NOTE: The training file uses the unique integer ids that are set in the entity2id.txt and relation2id.txt files rather than the actual id and relation.
    - e1 = subject
    - e2 = object
    - rel = relation
    
#### Testing
2 Files
- **test2id.txt:** Testing file. The first line is the number of triples for testing. The following lines are all in the format (e1, e2, rel).
- **valid2id.txt:** Validating file. The first line is the number of triples for validating. The following lines are all in the format (e1, e2, rel).

#### Other
- **type_constrain.txt:** Type contrain file. The first line is the number of relations. The following are the constraint type for each relation.


## Word2Vec Models

```
$ pip3 install gensim
```

## Importing Modules, Libraries and Data
Importing Python and OpenKE models


In [2]:
# Ignore Warnings
import warnings
warnings.simplefilter(action='ignore', category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)

In [3]:
# From OpenKE
from OpenKE import models, config

# From Python3
import os                         # Running terminal commands
import random                     # Random Number Generator
import numpy as np                # Math library
import pandas as pd               # Data Analytics and DataFrames
import seaborn as sns             # Visualization
import multiprocessing            # Parallel Processing
import tensorflow as tf           # Tensorflow ML library
import gensim.models as w2v       # Word2Vec Library
import matplotlib.pyplot as plt   # Plotting
from sklearn import preprocessing # Normalize

In [11]:
train_df = pd.read_csv('FB15K/train.txt', sep = '\t', names = ['Entity', 'Relation', 'Tail'])
test_df = pd.read_csv('FB15K/test.txt', sep = '\t', names = ['Entity', 'Relation', 'Tail'])
valid_df = pd.read_csv('FB15K/valid.txt', sep = '\t', names = ['Entity', 'Relation', 'Tail'])

In [5]:
len(test_df)

59071

In [6]:
len(train_df)

483142

In [7]:
len(valid_df)

50000

## Evaluation Framework
Evaluation framework using Hits@10. Pass output file from java eval file

In [19]:
def eval_frame(file, test_len):
    
    # Open file
    f = open(file)
    
    # Hits counter
    hits = 0
    
    # Loop though all facts in KB
    for x in range(test_len):

        # Read line
        fact = f.readline()
        fact = fact.split(' ')
        if fact != ['']:
            # Get target head and tail
            head_target = fact[0]
            tail_target = fact[2][:-1]


            # Get head predictions
            headpreds = f.readline()
            headpreds = headpreds.split(' ')
            headpreds = headpreds[1].split('\t')
            headpreds.pop()

            # Get tail predictions
            tailpreds = f.readline()
            tailpreds = tailpreds.split(' ')
            tailpreds = tailpreds[1].split('\t')
            tailpreds.pop()


            if (head_target in headpreds) and (tail_target in tailpreds):
                if (len(headpreds) < 10) and (len(tailpreds) < 10):
                    hits+=1
        else:
            print('miss')
                
    return hits/(test_len)

## Baseline for FB15K
Using the AMIE Rule-Mining system to mine rules on the baseline FB15K dataset. Then evaluating the rules using Hits@10

#### Rule Mining using AMIE

In [14]:
os.system('java -XX:-UseGCOverheadLimit -Xmx4g -jar AMIE/amie_plus.jar -minhc 0.0 -mins 0 -minis 0 FB15K/train.txt > rules/baseline_rules.txt')

0

**Note:** Remember to clean 'baseline_rules.txt' so it only includes the rule. Take out output from top and bottom so there are only rules.

#### Evaluating Rules

In [209]:
os.system('java -jar AMIE/ApplyAMIERules.jar rules/baseline_rules.txt FB15K/train.txt FB15K/test.txt FB15K/valid.txt evaluation/baseline_rules_eval.txt')

0

In [210]:
print('Hits@10: ' + str(eval_frame('evaluation/baseline_rules_eval.txt', len(test_df))))

Hits@10: 0.8082815594792707


## DistMult Graph Embeddings
Embedding the entities in FB15K using DistMult. Below is the neural network architecture for DistMult Embedding.

<img src="images/nnarch.png" style="width: 400px;"/>

### Embedding using DistMult in OpenKE Framework

#### Initialize

In [3]:
con = config.Config()

#### Importing Datasets

In [4]:
con.set_in_path('./benchmarks/FB15K/')

In [5]:
con.set_test_link_prediction(True)
con.set_test_triple_classification(True)

#### Allocate CPU Threads

In [6]:
con.set_work_threads(multiprocessing.cpu_count())

#### Configure Parameters for Training

In [7]:
con.set_train_times(500)  # To set the data traversing rounds
con.set_nbatches(100)     # To split the training triples into several batches
con.set_alpha(0.1)        # To set the learning rate
con.set_dimension(100)    # To set the dimensions of the entities and relations at the same time
# con.set_margin(1)         # To set the margin for the loss function

#### Negative Sampling

In [8]:
con.set_bern(0)            # To set negative sampling algorithms, unif (bern = 0) or bern (bern = 1)
con.set_ent_neg_rate(1)   # For each positive triple, we construct rate negative triples by corrupt the entity
con.set_rel_neg_rate(0)    # For each positive triple, we construct rate negative triples by corrupt the relation

#### Gradient Optimization Function 

In [9]:
con.set_opt_method("Adagrad")  # To set the gradient descent optimization algorithm (SGD, Adagrad, Adadelta, Adam)

#### Export Results to File

In [10]:
con.set_export_files("./res/model.vec.tf", 0)  # To set the export file of model paramters, every few rounds
con.set_out_files("./res/embedding.vec.json")  # To export model parameters to json files when training completed

#### Train Models

In [None]:
con.init()                     # Initialize the experimental settings

#### Test Models

In [None]:
con.test()

## Similarity Links based on Graph Embeddings (Cosine Similarity)
Using the DistMult graph embedding and cosine similarity in order to determine and generate similarity links and introduce them into the graph. The method used in shown below:
<img src="images/cossim.png" style="width: 400px;"/>

### Cosine-Similarity Implementation

In [None]:
# Dot Product
def dot(x,y):
    return np.sum(x * y) 

# Vector Magnitude
def mag(x):
    return np.sqrt(np.sum(x * x))

# Cosine Similarity
def cosine_similar_to(h,t,ent_embeddings):
    ent_h = ent_embeddings[h]
    ent_t = ent_embeddings[t]
    cos_sim = np.absolute(dot(ent_h,ent_t)) / (mag(ent_h) * mag(ent_t))
    return(cos_sim)

In [10]:
sim_links_07 = pd.read_csv('./cos_sim_links/cos_sim_links_07.tsv', sep = '\t', index_col=0)
sim_links_085 = pd.read_csv('./cos_sim_links/cos_sim_links_085.tsv', sep = '\t', index_col=0)
sim_links_09 = pd.read_csv('./cos_sim_links/cos_sim_links_09.tsv', sep = '\t', index_col=0)

In [12]:
sim_links_07_enriched = train_df.append(sim_links_07)
sim_links_085_enriched = train_df.append(sim_links_085)
sim_links_09_enriched = train_df.append(sim_links_09)

In [15]:
sim_links_07_enriched.to_csv('./graph_enriched_data/sim_links_07_enriched.txt', sep='\t', index=False)
sim_links_085_enriched.to_csv('./graph_enriched_data/sim_links_085_enriched.txt', sep='\t', index=False)
sim_links_09_enriched.to_csv('./graph_enriched_data/sim_links_09_enriched.txt', sep='\t', index=False)

In [16]:
os.system('java -XX:-UseGCOverheadLimit -Xmx4g -jar AMIE/amie_plus.jar -minhc 0.0 -mins 0 -minis 0 ./graph_enriched_data/sim_links_09_enriched.txt > rules/graph_rules.txt')

0

In [18]:
os.system('java -jar AMIE/ApplyAMIERules.jar rules/graph_rules.txt ./graph_enriched_data/sim_links_09_enriched.txt FB15K/test.txt FB15K/valid.txt evaluation/graph_rules_eval.txt')

0

In [20]:
print('Hits@10: ' + str(eval_frame('evaluation/graph_rules_eval.txt', len(test_df))))

Hits@10: 0.8027119906553131


### Getting Graph Embedding
Using built-in method to obtain graph embedding. This will be used for cosine similarity

In [None]:
embeddings = con.get_parameters('numpy')
embeddings = embeddingsbeddingsbeddings['ent_embeddings']

### Generating similar_to links 

Generates similar_to link based on cosine similiarity above a certain threshold

In [None]:
def gen_cos_sim_links(iterations = 100000, threshold = 0.8, embeddings = embeddings):
    
    h_list = []; t_list = []; acc_list = []
    for _ in range(iterations):
        h = random.randint(1, 14540)
        t = random.randint(1, 14540)
        acc = cosine_similar_to(h,t,embeddings)
        if acc > threshold:
            h_list.append(h)
            t_list.append(t)       
    
    d = {'head': h_list, 'tail': t_list}
    return pd.DataFrame(data = d)

In [None]:
sims = gen_cos_sim_links(iterations=100000, embeddings=embeddings)

## Similarity Links based on Word2Vec Embeddings (Cosine Similarity)
Using text-based information from the English Wikipedia to create a Word2Vec model. Using cosine similairty from before on these embeddings in order to determing similarity links creation. The gensim module used is nice and automatically will do cos-sim for us so we don't have to call additional functions. Below is the network architecture for the skip-gram embedding model.
<img src="images/sgnn.png" style="width: 300px;"/>


### Converting from Freebase RDF m.id to Real Name
Right now all the entities are referenced as m.id from Freebase. We will convert these to thier real names using the freebase mapping file referenced in the code below

In [146]:
fb_mapping_df = pd.read_csv('FB15K/freebase-entities.tsv', '\t', header=None, names=['fb','string'])

keys = list(fb_mapping_df['fb'])
mapping = list(fb_mapping_df['string'])

keys_fin = ['/'+w.replace('.','/') for w in keys]

all_mids = list(set(train_df.Entity) | set(train_df.Tail))

mapping_dict = dict(zip(keys_fin,mapping))

keys_complete = list(set(keys_fin) & set(all_mids))

fb15k_maps = [mapping_dict[x] for x in keys_complete]

mapping_dict = dict(zip(keys_complete,fb15k_maps))



In [147]:
reverse_mapping_dict = dict(zip(fb15k_maps, keys_complete))

In [148]:
mapping_dict['/m/027rn']

'Dominican Republic'

In [149]:
reverse_mapping_dict['Dominican Republic']

'/m/027rn'

In [59]:
tmp = fb_mapping_df.set_index('fb')

In [60]:
test_df_copy = test_df
train_df_copy = train_df
valid_df_copy = valid_df

In [124]:
def translate_mid(df):
    
    # Getting all possible entites
    all_ents = df['Entity']
    all_ents.append(df['Tail'])
    
    # Getting rid of duplicates
    all_ents = list(set(all_ents))
    
    # Filtering out entites missing from the mapping file (not to many just throw them out)
    matches = []
    missing = []
    
    # Checking to see what's actually missing
    for ent in all_ents:
    
        if 'm.' + ent[3:] in tmp.index:
            matches.append(tmp.loc['m.' + ent[3:]])

        else:
            missing.append(ent)
    
    # Checking to see if all entities are accounted for (can comment out if it's annoying)
    if(len(matches) + len(missing) == len(all_ents)):
        
        # Dropping missing entities
        drop_index = []
        
        for i in range(len(df['Entity'])):

            if df['Entity'][i] in missing:
                drop_index.append(i)

            elif df['Tail'][i] in missing:
                drop_index.append(i)

        cleaned_df = df.drop(drop_index, axis = 0)
        
        
        for i in range(len(cleaned_df['Entity'])):
            try:
                tail = tmp.loc['m.' + cleaned_df['Tail'][i][3:]]
                ent = tmp.loc['m.' + cleaned_df['Entity'][i][3:]]
                cleaned_df['Entity'][i] = ent[0]
                cleaned_df['Tail'][i] = tail[0]
            except:
                pass
        
        return cleaned_df
    else:
        print('Something went wrong!!!')
        return

In [None]:
test_cleaned = translate_mid(test_df_copy)

In [21]:
train_cleaned = translate_mid(train_df_copy)

In [35]:
valid_cleaned = translate_mid(valid_df_copy)

In [26]:
test_cleaned.to_csv('convert_data/test_cleaned.tsv', sep = '\t', index = False)

In [27]:
train_cleaned.to_csv('convert_data/train_cleaned.tsv', sep = '\t', index = False)

In [36]:
valid_cleaned.to_csv('convert_data/valid_cleaned.tsv', sep = '\t', index = False)

In [170]:
test_cleaned = pd.read_csv('convert_data/test_cleaned.tsv', sep = '\t')
train_cleaned = pd.read_csv('convert_data/train_cleaned.tsv', sep = '\t')
valid_cleaned = pd.read_csv('convert_data/valid_cleaned.tsv', sep = '\t')

In [171]:
train_cleaned = train_cleaned.iloc[:473750]

In [172]:
test_cleaned = test_cleaned.iloc[:57869]

In [173]:
valid_cleaned = valid_cleaned.iloc[:49026]

### Word2Vec Model using English Wikipedia

In [24]:
wiki_w2v = w2v.KeyedVectors.load_word2vec_format('english_wiki_model.bin', binary=True)

### Converting df entities to wiki model format

I found that all the items in FB15K are proper nouns i.e. names, titles, etc (the few that aren't can be represented as proper nouns). This makes the job pretty easy as we can add the POS constraint into the df so it easily works with the english wiki to generate some links

In [25]:
english_wiki_vocab = list(wiki_w2v.vocab)
english_wiki_vocab_prop  = [x for x in english_wiki_vocab if x[-6:] == '_PROPN']

### Function converts entities into wiki form and back

#### Entity sub-functions

In [127]:
def convert_to_wiki_form(ent):
    ent = ent.replace(' ', '::')
    ent = ent + '_PROPN'
    return ent 

In [128]:
def convert_to_name(ent):
    ent = ent[:-6]
    ent = ent.replace('::', ' ')
    return ent

In [129]:
def name2mid(ent):
    try:
        mid = reverse_mapping_dict[ent]
    except:
        mid = '--'
    return mid

#### Formatting sub-functions

In [130]:
def format_wiki(_df):
    
    df = _df
    # Run function on all entities in the DataFrame
    df.Entity = df.Entity.apply(convert_to_wiki_form)
    df.Tail = df.Tail.apply(convert_to_wiki_form)
        
    return df

In [131]:
def format_back(_df):
    
    df = _df
    df.Entity = df.Entity.apply(convert_to_name)
    df.Tail = df.Tail.apply(convert_to_name)
        
    return df

In [132]:
def format_back_to_names(_df):
    df = _df
    df.Entity = df.Entity.apply(name2mid)
    df.Tail = df.Tail.apply(name2mid)

    return df

#### Formatting to Wiki-Model Form

In [174]:
wiki_test = format_wiki(test_cleaned)

In [175]:
wiki_train = format_wiki(train_cleaned)

In [176]:
wiki_valid = format_wiki(valid_cleaned)

In [177]:
wiki_train.head(3)

Unnamed: 0,Entity,Relation,Tail
0,Dominican::Republic_PROPN,/location/country/form_of_government,Republic_PROPN
1,Mighty::Morphin::Power::Rangers_PROPN,/tv/tv_program/regular_cast./tv/regular_tv_app...,Wendee::Lee_PROPN
2,Michelle::Rodriguez_PROPN,/award/award_winner/awards_won./award/award_ho...,Naveen::Andrews_PROPN


### Creating Similarity Links 
Use the built in most_similar function which uses cos-similarity but in the Word2Vec Embeddings instead of the graph emdbeddings we used before. The goal is to enrich the graph with outside text sources such as wikipedia which Freebase is based off of. 

In [157]:
def create_wiki_sim_links(all_ents):
    
    ents = []; tails = []; conf = []
    
    for i in all_ents:
        if (i in english_wiki_vocab_prop):
            simlinks_for_i = [x for x in wiki_w2v.most_similar(i, topn  = 20)]
            tails_to_add = [x[0] for x in simlinks_for_i]
            conf_to_add = [x[1] for x in simlinks_for_i]
            if len(simlinks_for_i) > 0:
                tails.extend(tails_to_add)
                ents.extend([i]*len(simlinks_for_i))
                conf.extend(conf_to_add)
        else:
            continue
    return pd.DataFrame(data = {'Entity': ents, 'Relation': ['/similar_to'] * len(ents), 'Tail': tails, 'Confidence': conf})

In [158]:
test_sim_links = create_wiki_sim_links(all_ents)

Exception ignored in: 'zmq.backend.cython.message.Frame.__dealloc__'
Traceback (most recent call last):
  File "zmq/backend/cython/checkrc.pxd", line 12, in zmq.backend.cython.checkrc._check_rc
KeyboardInterrupt: 


NameError: name 'all_ents' is not defined

In [None]:
test_sim_links.head()

In [207]:
test_sim_links.to_csv('text_sim_links_fb15k.tsv', sep='\t', index=False)

#### Selecting Threshold for Similarity

In [178]:
def select_sim_links(file = 'text_sim_links_fb15k.tsv', threshold = 0.85):
    links = pd.read_csv('text_sim_links_fb15k.tsv', sep = '\t')
    filtered_links = links[links['Confidence'] > threshold]
    return filtered_links

In [189]:
wiki_train_enriched = wiki_train.append(select_sim_links(threshold=0.90))

In [190]:
wiki_train_enriched = wiki_train_enriched.iloc[:,1:]

In [191]:
len(wiki_train_enriched)

474839

In [167]:
wiki_test_enriched = wiki_test.append(select_sim_links(threshold=0.8))
wiki_test_enriched = wiki_test_enriched.iloc[:,1:]

In [168]:
wiki_test_enriched = wiki_test_enriched[wiki_test_enriched['Entity'] != '--']
wiki_test_enriched = wiki_test_enriched[wiki_test_enriched['Tail'] != '--']

In [192]:
len(wiki_train_enriched[wiki_train_enriched['Relation'] == '/similar_to'])

1089

In [201]:
wiki_train_enriched[wiki_train_enriched['Relation'] == '/similar_to'].sample(10)

Unnamed: 0,Entity,Relation,Tail
89484,Buffalo::Sabres_PROPN,/similar_to,Edmonton::Oilers_PROPN
27621,Salavat::Yulaev::Ufa_PROPN,/similar_to,Ak::Bars::Kazan_PROPN
94186,New::York::Jets_PROPN,/similar_to,Cincinnati::Bengals_PROPN
86103,June_PROPN,/similar_to,August_PROPN
11062,Utah::Jazz_PROPN,/similar_to,Phoenix::Suns_PROPN
121825,Edmonton::Eskimos_PROPN,/similar_to,Saskatchewan::Roughriders_PROPN
106626,Buffalo::Bills_PROPN,/similar_to,Kansas::City::Chiefs_PROPN
76464,October_PROPN,/similar_to,August_PROPN
87080,Dallas::Mavericks_PROPN,/similar_to,Houston::Rockets_PROPN
69140,Chicago::Bears_PROPN,/similar_to,Philadelphia::Eagles_PROPN


#### Converting from Wiki format to /m/id format

In [184]:
def wiki_to_mid(_df):
    df = _df
    df = format_back(df)
    df = format_back_to_names(df)
    return df

In [202]:
wiki_train_enriched = wiki_to_mid(wiki_train_enriched)
wiki_test_enriched = wiki_to_mid(wiki_test_enriched)

In [203]:
wiki_train_enriched.head()

Unnamed: 0,Entity,Relation,Tail
0,/m/027rn,/location/country/form_of_government,/m/06cx9
1,/m/017dcd,/tv/tv_program/regular_cast./tv/regular_tv_app...,/m/06v8s0
2,/m/01sl1q,/award/award_winner/awards_won./award/award_ho...,/m/044mz_
3,/m/0cnk2q,/soccer/football_team/current_roster./sports/s...,/m/02nzb8
4,/m/02_j1w,/sports/sports_position/players./soccer/footba...,/m/01cwm1


In [204]:
wiki_train_enriched = wiki_train_enriched[wiki_train_enriched['Entity'] != '--']
wiki_train_enriched = wiki_train_enriched[wiki_train_enriched['Tail'] != '--']

#### Random Sample

In [188]:
wiki_train_enriched[wiki_train_enriched['Relation'] == '/similar_to'].sample()

Unnamed: 0,Entity,Relation,Tail
147824,/m/05g3b,/similar_to,/m/0289q


#### Saving Wiki-Enriched KB to Folder


In [205]:
wiki_train_enriched.to_csv('wiki_enrich_data/wiki_train_enriched_09.txt', sep = '\t', index = False)
wiki_test_enriched.to_csv('wiki_enrich_data/wiki_test_enriched.txt', sep = '\t', index = False)

In [154]:
wiki_train_enriched.head()

Unnamed: 0,Entity,Relation,Tail
0,/m/027rn,/location/country/form_of_government,/m/06cx9
1,/m/017dcd,/tv/tv_program/regular_cast./tv/regular_tv_app...,/m/06v8s0
2,/m/01sl1q,/award/award_winner/awards_won./award/award_ho...,/m/044mz_
3,/m/0cnk2q,/soccer/football_team/current_roster./sports/s...,/m/02nzb8
4,/m/02_j1w,/sports/sports_position/players./soccer/footba...,/m/01cwm1


### Mining Rules 
Using AMIE again to mine rules from the wiki-enriched KB

In [207]:
os.system('java -XX:-UseGCOverheadLimit -Xmx4g -jar AMIE/amie_plus.jar -minhc 0.0 -mins 0 -minis 0 wiki_enrich_data/wiki_train_enriched.txt > rules/wiki_rules.txt')

33280

**Note:** Remember to clean 'baseline_rules.txt' so it only includes the rule. Take out output from top and bottom so there are only rules.

In [208]:
os.system('java -jar AMIE/ApplyAMIERules.jar rules/wiki_rules.txt wiki_enrich_data/wiki_train_enriched.txt FB15K/test.txt  FB15K/valid.txt evaluation/wiki_rules_eval.txt')

256

In [87]:
print('Hits@10: ' + str(eval_frame('evaluation/_rules_eval.txt', len(test_df))))

Hits@10: 0.7189822417091297


## Results (Hits@10)

### Graph Embeddings
    - Threshold : 0.85  ----  Hits@10 : 
    - Threshold : 0.85  ----  Hits@10 :
### Word2Vec Wiki Embeddings
    - Threshold : 0.85  ----  Hits@10 : 0.71898 
    - Threshold : 0.85  ----  Hits@10 :