Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
e52bb30
Merge remote-tracking branch 'origin/dev' into dev-benchmarks-bofang
libofang Aug 6, 2018
17ebb36
Merge remote-tracking branch 'origin/master' into dev-benchmarks-bofang
libofang Aug 6, 2018
4e122bf
add full command options for benchmark __main__ file
libofang Aug 6, 2018
d166b6d
unify the name embeddings
libofang Sep 20, 2018
0a89164
relation extraction test unit
libofang Sep 20, 2018
4d6dbaf
relation extraction dataset
libofang Sep 20, 2018
b605aeb
relation extraction code
libofang Sep 20, 2018
117f7f2
make the training and test set the same (for testing )
libofang Sep 20, 2018
d36fc76
load only the dataset (embeddings are loaded by vecto)
libofang Sep 20, 2018
2420286
relation extraction code
libofang Sep 26, 2018
f68ed98
Merge remote-tracking branch 'origin/dev' into dev-benchmarks-bofang
libofang Sep 26, 2018
3ecb09b
add keras to requirements
libofang Sep 26, 2018
171f35d
add tensorflow to requirements
libofang Sep 26, 2018
8d5fcdc
add tf-nightly to requirements
libofang Sep 26, 2018
2a981b0
add tensorflow to requirements
libofang Sep 26, 2018
fe83b3a
remove x32 since tensorflow doesn't support it.
libofang Sep 26, 2018
84946fd
test new machine push
libofang Oct 12, 2018
5c9fad7
remove all unit test
libofang Oct 12, 2018
6664beb
test should be the same file
libofang Oct 12, 2018
44d2247
remove all relation extraction unittest
libofang Oct 12, 2018
3f8f52a
remove tensorflow requirement
libofang Oct 12, 2018
3839245
test
libofang Oct 12, 2018
c94d2ee
remove test file
libofang Oct 12, 2018
c270d4b
test
libofang Oct 12, 2018
442b477
remove relation_extraction code
libofang Oct 12, 2018
5228b51
add relation extraction code
libofang Oct 13, 2018
db529c5
add main file
libofang Oct 13, 2018
7f521ad
add init
libofang Oct 13, 2018
ad07fb7
comment all relation_extraction code
libofang Oct 13, 2018
a6e63b8
test
libofang Oct 13, 2018
00a04e1
test
libofang Oct 13, 2018
9f84779
add import
libofang Oct 13, 2018
48ac33e
remove keras import
libofang Oct 13, 2018
d10ac36
add import keras models
libofang Oct 13, 2018
af4b44f
Merge branch 'dev' into dev-benchmarks-bofang
undertherain Jan 6, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ environment:
CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\appveyor\\run_with_env.cmd"

matrix:
- PYTHON: "C:\\Python35"
PYTHON_VERSION: "3.5.0"
PYTHON_ARCH: "32"
# - PYTHON: "C:\\Python35"
# PYTHON_VERSION: "3.5.0"
# PYTHON_ARCH: "32"

- PYTHON: "C:\\Python35-x64"
PYTHON_VERSION: "3.5.0"
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,6 @@ system-query
tables
traitlets
tqdm
keras
tensorflow
requests
32 changes: 32 additions & 0 deletions tests/data/benchmarks/relation_extraction/test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Component-Whole(e2,e1) 12 15 The system as described above has its greatest application in an arrayed configuration of antenna elements .
Other 1 9 The child was carefully wrapped and bound into the cradle by means of a cord .
Instrument-Agency(e2,e1) 1 7 The author of a keygen uses a disassembler to look at the raw assembly code .
Other 2 6 A misty ridge uprises from the surge .
Member-Collection(e1,e2) 1 2 The student association is the voice of the undergraduate student population of the State University of New York at Buffalo .
Other 4 10 This is the sprawling complex that is Peru 's largest producer of silver .
Cause-Effect(e2,e1) 7 19 The current view is that the chronic inflammation in the distal part of the stomach caused by Helicobacter pylori infection results in an increased acid production from the non-infected upper corpus region of the stomach .
Entity-Destination(e1,e2) 0 6 People have been moving back into downtown .
Content-Container(e1,e2) 1 6 The lawsonite was contained in a platinum crucible and the counter-weight was a plastic crucible with metal pieces .
Entity-Destination(e1,e2) 12 20 The solute was placed inside a beaker and 5 mL of the solvent was pipetted into a 25 mL glass flask for each trial .
Member-Collection(e1,e2) 2 6 The fifty essays collected in this volume testify to most of the prominent themes from Professor Quispel 's scholarly career .
Other 1 5 Their composer has sunk into oblivion .
Message-Topic(e1,e2) 6 9 The Pulitzer Committee issues an official citation explaining the reasons for the award .
Cause-Effect(e2,e1) 1 8 The burst has been caused by water hammer pressure .
Instrument-Agency(e2,e1) 2 6 Even commercial networks have moved into high-definition broadcast .
Message-Topic(e1,e2) 4 10 It was a friendly call to remind them about the bill and make sure they have a copy of the invoice .
Instrument-Agency(e2,e1) 1 8 Texas-born virtuoso finds harmony , sophistication in Appalachian instrument .
Product-Producer(e2,e1) 1 14 The factory 's products have included flower pots , Finnish rooster-whistles , pans , trays , tea pots , ash trays and air moisturisers .
Component-Whole(e2,e1) 7 8 The girl showed a photo of apple tree blossom on a fruit tree in the Central Valley .
Member-Collection(e2,e1) 20 23 They tried an assault of their own an hour later , with two columns of sixteen tanks backed by a battalion of Panzer grenadiers .
Entity-Origin(e1,e2) 1 18 Their knowledge of the power and rank symbols of the Continental empires was gained from the numerous Germanic recruits in the Roman army , and from the Roman practice of enfeoffing various Germanic warrior groups with land in the imperial provinces .
Member-Collection(e2,e1) 4 9 She soon had a stable of her own rescued hounds .
Cause-Effect(e1,e2) 1 14 The singer , who performed three of the nominated songs , also caused a commotion on the red carpet .
Other 5 11 His intellectually engaging books and essays remain pertinent to illuminating contemporary history .
Member-Collection(e2,e1) 7 10 Poor hygiene controls , reports of a brace of gamey grouse and what looked like a skinned fox all amounted to a pie that was unfit for human consumption .
Other 2 7 This sweet dress is made with a blend of cotton and silk , and the crochet flower necklace is the perfect accessory .
Cause-Effect(e1,e2) 0 8 Suicide is one of the leading causes of death among pre-adolescents and teens , and victims of bullying are at an increased risk for committing suicide .
Message-Topic(e1,e2) 1 7 This article gives details on 2004 in music in the United Kingdom , including the official charts from that year .
Message-Topic(e1,e2) 12 16 We have therefore taken the initiative to convene the first international open meeting dedicated solely to rural history .
Component-Whole(e1,e2) 1 4 The timer of the device automatically eliminates wasted `` standby power '' consumption by automatically turn off electronics plugged into the `` auto off '' outlets .
Message-Topic(e2,e1) 5 8 Bob Parks made a similar offer in a phone call made earlier this week .
Cause-Effect(e2,e1) 5 7 He had chest pains and headaches from mold in the bedrooms .
32 changes: 32 additions & 0 deletions tests/data/benchmarks/relation_extraction/train.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Component-Whole(e2,e1) 12 15 The system as described above has its greatest application in an arrayed configuration of antenna elements .
Other 1 9 The child was carefully wrapped and bound into the cradle by means of a cord .
Instrument-Agency(e2,e1) 1 7 The author of a keygen uses a disassembler to look at the raw assembly code .
Other 2 6 A misty ridge uprises from the surge .
Member-Collection(e1,e2) 1 2 The student association is the voice of the undergraduate student population of the State University of New York at Buffalo .
Other 4 10 This is the sprawling complex that is Peru 's largest producer of silver .
Cause-Effect(e2,e1) 7 19 The current view is that the chronic inflammation in the distal part of the stomach caused by Helicobacter pylori infection results in an increased acid production from the non-infected upper corpus region of the stomach .
Entity-Destination(e1,e2) 0 6 People have been moving back into downtown .
Content-Container(e1,e2) 1 6 The lawsonite was contained in a platinum crucible and the counter-weight was a plastic crucible with metal pieces .
Entity-Destination(e1,e2) 12 20 The solute was placed inside a beaker and 5 mL of the solvent was pipetted into a 25 mL glass flask for each trial .
Member-Collection(e1,e2) 2 6 The fifty essays collected in this volume testify to most of the prominent themes from Professor Quispel 's scholarly career .
Other 1 5 Their composer has sunk into oblivion .
Message-Topic(e1,e2) 6 9 The Pulitzer Committee issues an official citation explaining the reasons for the award .
Cause-Effect(e2,e1) 1 8 The burst has been caused by water hammer pressure .
Instrument-Agency(e2,e1) 2 6 Even commercial networks have moved into high-definition broadcast .
Message-Topic(e1,e2) 4 10 It was a friendly call to remind them about the bill and make sure they have a copy of the invoice .
Instrument-Agency(e2,e1) 1 8 Texas-born virtuoso finds harmony , sophistication in Appalachian instrument .
Product-Producer(e2,e1) 1 14 The factory 's products have included flower pots , Finnish rooster-whistles , pans , trays , tea pots , ash trays and air moisturisers .
Component-Whole(e2,e1) 7 8 The girl showed a photo of apple tree blossom on a fruit tree in the Central Valley .
Member-Collection(e2,e1) 20 23 They tried an assault of their own an hour later , with two columns of sixteen tanks backed by a battalion of Panzer grenadiers .
Entity-Origin(e1,e2) 1 18 Their knowledge of the power and rank symbols of the Continental empires was gained from the numerous Germanic recruits in the Roman army , and from the Roman practice of enfeoffing various Germanic warrior groups with land in the imperial provinces .
Member-Collection(e2,e1) 4 9 She soon had a stable of her own rescued hounds .
Cause-Effect(e1,e2) 1 14 The singer , who performed three of the nominated songs , also caused a commotion on the red carpet .
Other 5 11 His intellectually engaging books and essays remain pertinent to illuminating contemporary history .
Member-Collection(e2,e1) 7 10 Poor hygiene controls , reports of a brace of gamey grouse and what looked like a skinned fox all amounted to a pie that was unfit for human consumption .
Other 2 7 This sweet dress is made with a blend of cotton and silk , and the crochet flower necklace is the perfect accessory .
Cause-Effect(e1,e2) 0 8 Suicide is one of the leading causes of death among pre-adolescents and teens , and victims of bullying are at an increased risk for committing suicide .
Message-Topic(e1,e2) 1 7 This article gives details on 2004 in music in the United Kingdom , including the official charts from that year .
Message-Topic(e1,e2) 12 16 We have therefore taken the initiative to convene the first international open meeting dedicated solely to rural history .
Component-Whole(e1,e2) 1 4 The timer of the device automatically eliminates wasted `` standby power '' consumption by automatically turn off electronics plugged into the `` auto off '' outlets .
Message-Topic(e2,e1) 5 8 Bob Parks made a similar offer in a phone call made earlier this week .
Cause-Effect(e2,e1) 5 7 He had chest pains and headaches from mold in the bedrooms .
3 changes: 2 additions & 1 deletion vecto/benchmarks/language_modeling/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,12 @@ def main():
help='use small test dataset')
parser.add_argument("--method", default='lstm', choices=['lr', '2FFNN', 'lstm'],
help='name of method')
parser.add_argument('--normalize', dest='normalize', action='store_true')
parser.add_argument("--path_out", default=False, help="destination folder to save results")
args = parser.parse_args()
embeddings = load_from_dir(args.embeddings)
# print("embeddings", embeddings)
language_modeling = Language_modeling(window_size=args.window_size, method=args.method, test=args.test)
language_modeling = Language_modeling(normalize=args.normalize, window_size=args.window_size, method=args.method, test=args.test)
results = language_modeling.get_result(embeddings)
if args.path_out:
if os.path.isdir(args.path_out) or args.path_out.endswith("/"):
Expand Down
1 change: 1 addition & 0 deletions vecto/benchmarks/relation_extraction/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .relation_extraction import Relation_extraction
60 changes: 60 additions & 0 deletions vecto/benchmarks/relation_extraction/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import argparse
import json
import logging
import os

from vecto.utils.data import save_json
from vecto.benchmarks.relation_extraction import Relation_extraction
from vecto.embeddings import load_from_dir

logging.basicConfig(level=logging.DEBUG)


def print_json(data):
print(json.dumps(data, ensure_ascii=False, indent=4, sort_keys=False))


def main():
# config = load_config()
# print(config)
parser = argparse.ArgumentParser()
parser.add_argument("embeddings")
parser.add_argument("dataset")

parser.add_argument('--batchsize', '-b', type=int, default=64,
help='Number of images in each mini-batch')
parser.add_argument('--epoch', '-e', type=int, default=1,
help='Number of sweeps over the dataset to train')
parser.add_argument('--nb_filter', '-nf', type=int, default=100,
help='filter number')
parser.add_argument('--filter_length', '-fl', type=int, default=3,
help='filter length')
parser.add_argument('--hidden_dims', '-hd', type=int, default=100,
help='D')
parser.add_argument('--position_dims', '-pd', type=int, default=100,
help='D')
parser.add_argument("--path_out", default=False, help="destination folder to save results")
args = parser.parse_args()
embeddings = load_from_dir(args.embeddings)
# print("embeddings", embeddings)
# print(args.normalize)
relation_extraction = Relation_extraction(batchsize=args.batchsize,
epoch=args.epoch,
nb_filter=args.nb_filter,
filter_length=args.filter_length,
hidden_dims=args.hidden_dims,
position_dims=args.position_dims,)
results = relation_extraction.get_result(embeddings, args.dataset)
if args.path_out:
if os.path.isdir(args.path_out) or args.path_out.endswith("/"):
dataset = os.path.basename(os.path.normpath(args.dataset))
name_file_out = os.path.join(args.path_out, dataset, "results.json")
save_json(results, name_file_out)
else:
save_json(results, args.path_out)
else:
print_json(results)


if __name__ == "__main__":
main()
150 changes: 150 additions & 0 deletions vecto/benchmarks/relation_extraction/preprocess.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@

from __future__ import print_function
import numpy as np
import gzip
import os
import sys
import pickle as pkl





#Mapping of the labels to integers
labelsMapping = {'Other':0,
'Message-Topic(e1,e2)':1, 'Message-Topic(e2,e1)':2,
'Product-Producer(e1,e2)':3, 'Product-Producer(e2,e1)':4,
'Instrument-Agency(e1,e2)':5, 'Instrument-Agency(e2,e1)':6,
'Entity-Destination(e1,e2)':7, 'Entity-Destination(e2,e1)':8,
'Cause-Effect(e1,e2)':9, 'Cause-Effect(e2,e1)':10,
'Component-Whole(e1,e2)':11, 'Component-Whole(e2,e1)':12,
'Entity-Origin(e1,e2)':13, 'Entity-Origin(e2,e1)':14,
'Member-Collection(e1,e2)':15, 'Member-Collection(e2,e1)':16,
'Content-Container(e1,e2)':17, 'Content-Container(e2,e1)':18}




words = {}
maxSentenceLen = [0,0]


distanceMapping = {'PADDING': 0, 'LowerMin': 1, 'GreaterMax': 2}
minDistance = -30
maxDistance = 30
for dis in range(minDistance,maxDistance+1):
distanceMapping[dis] = len(distanceMapping)
print(distanceMapping)


def getWordIdx(token, word2Idx):
"""Returns from the word2Idex table the word index for a given token"""
if token in word2Idx:
return word2Idx[token]
elif token.lower() in word2Idx:
return word2Idx[token.lower()]
return 0

def createTensor(file, word2Idx, maxSentenceLen=100):
"""Creates matrices for the events and sentence for the given file"""
labels = []
positionMatrix1 = []
positionMatrix2 = []
tokenMatrix = []

for line in open(file):
splits = line.strip().split('\t')

label = splits[0]
pos1 = splits[1]
pos2 = splits[2]
sentence = splits[3]
tokens = sentence.split(" ")

#print(label, pos1, pos2, sentence, tokens)


tokenIds = np.zeros(maxSentenceLen)
positionValues1 = np.zeros(maxSentenceLen)
positionValues2 = np.zeros(maxSentenceLen)

for idx in range(0, min(maxSentenceLen, len(tokens))):
tokenIds[idx] = getWordIdx(tokens[idx], word2Idx)

distance1 = idx - int(pos1)
distance2 = idx - int(pos2)
#print(distance1, distance2)
if distance1 in distanceMapping:
#print('helo')
positionValues1[idx] = distanceMapping[distance1]
elif distance1 <= minDistance:
positionValues1[idx] = distanceMapping['LowerMin']
else:
positionValues1[idx] = distanceMapping['GreaterMax']

if distance2 in distanceMapping:
positionValues2[idx] = distanceMapping[distance2]
elif distance2 <= minDistance:
positionValues2[idx] = distanceMapping['LowerMin']
else:
positionValues2[idx] = distanceMapping['GreaterMax']

tokenMatrix.append(tokenIds)
positionMatrix1.append(positionValues1)
positionMatrix2.append(positionValues2)

labels.append(labelsMapping[label])



return np.array(labels, dtype='int32'), np.array(tokenMatrix, dtype='int32'), np.array(positionMatrix1, dtype='int32'), np.array(positionMatrix2, dtype='int32'),







def load_data(embeddings, path_dataset):
files = [os.path.join(path_dataset, 'train.txt'), os.path.join(path_dataset, 'test.txt')]
for fileIdx in range(len(files)):
file = files[fileIdx]
for line in open(file):
splits = line.strip().split('\t')

label = splits[0]


sentence = splits[3]
tokens = sentence.split(" ")
maxSentenceLen[fileIdx] = max(maxSentenceLen[fileIdx], len(tokens))
for token in tokens:
words[token.lower()] = True


print("Max Sentence Lengths: ", maxSentenceLen)

# :: Read in word embeddings ::
# :: Read in word embeddings ::
word2Idx = embeddings.vocabulary.dic_words_ids
wordEmbeddings = embeddings.matrix


print("Embeddings shape: ", wordEmbeddings.shape)
print("Len words: ", len(words))



# :: Create token matrix ::
train_set = createTensor(files[0], word2Idx, max(maxSentenceLen))
test_set = createTensor(files[1], word2Idx, max(maxSentenceLen))


data = {'wordEmbeddings': wordEmbeddings, 'word2Idx': word2Idx,
'train_set': train_set, 'test_set': test_set}

return data



print("Data preprocessing done!")
Loading