GitHub - volkancirik/conll2eparse: convert conll formatted benchmark into embedding parsing

conll2eparse

============

This repository helps you prepare your conll-formatted benchmark for dependency parsing with embeddings. Structure is as follows.

src/ the scripts and source code

bin/ binary files

run/ benchmark folders will be generated here

embeddings/ word embeddings file. space separated, *UNKNOWN* is for unknown words. file extension is .embeddings data/ your benchmarks will be under this folder. the structure should be like this: there should be 3 folders. 00 for train, 01 for development 02 for test. file extension should be .dp .

See data/ for sample benchmark conll-ptb-sample, and sample word vector files for embeddings/. Go to run/ . First generate the binary files.

 make bin

To generate word-type embedded conll benchmark:

 make prepare.type.conll-ptb-sample_cw-rcv1-25-scaled DIM=25

This will generate a directory for the sample benchmark with CW embeddings.

To generate context-dependent (token-based) benchmark you need to generate substitute distributions which requires a language model. Train a language model using this repository and put it under data/language_models. Assuming you have a language model wsj.lm.gz under data/language_models you should be able to generate context dependent word vectors for the benchmark as follows.

 make prepare.token.conll-ptb-sample_cw-rcv1-25-scaled+scode-wikipedia-25 DIM=50 LM=wsj.lm.gz

Above cw-rcv1-25-scaled is wort-type embeddings and scode-wikipedia-25 is context embeddings. Set these word vectors whatever you want using EMB_TYPE and EMB_CONTEXT flags. DIM flag is 50 since the total number of embeddings is 25+25=50. See Makefile for other flags.

TODO :

A better README

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
data/conll-ptb-sample		data/conll-ptb-sample
embeddings		embeddings
run		run
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

data/conll-ptb-sample

data/conll-ptb-sample

embeddings

embeddings

run

run

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

conll2eparse

About

Releases

Packages

Languages

volkancirik/conll2eparse

Folders and files

Latest commit

History

Repository files navigation

conll2eparse

About

Resources

Stars

Watchers

Forks

Languages