Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
executable file 175 lines (156 sloc) 8.04 KB
<!DOCTYPE html>
<html><head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<link rel="stylesheet" href="ZPar%20_%20Dependency%20Parsing_files/bootstrap.css">
<link rel="stylesheet" href="ZPar%20_%20Dependency%20Parsing_files/bootstrap-responsive.xml">
<style type="text/css">
.nav { }
.nav li { float: left; width: 110px; }
.container { background-color: #ffffff; padding: 30px; }
.content { padding: 30px; }
body, h1, h2, h3, h4 { font-family: "Trebuchet MS","Helvetica Neue",Arial,Helvetica,sans-serif,"Georgia";}
body { background-color: #eeeeee; }
header { width: 100%; }
blockquote p { font-size: 14px; }
</style>
<title>ZPar | Dependency Parsing</title>
</head>
<body>
<div class="table-bordered container">
<div class="content">
<header>
<div class="page-header text-center">
<h1>Dependency Parsing</h1>
</div>
</header>
<h2><a id="How-to-compile">How to compile</a></h2>
<p>
Suppose that ZPar has been downloaded to the directory <code>zpar</code>. To make a dependency parsing system for English, type <code>make english.depparser</code>. This will create a directory <code>zpar/dist/english.depparser</code>, in which there are two files: <code>train</code> and <code>depparser</code>. The file <code>train</code> is used to train a parsing model, and the file <code>depparser</code>
is used to parse new texts using a trained parsing model. Similarly, we
can make a dependency parsing system for Chinese by typing <code>make chinese.depparser</code>. The <code>train</code> and <code>depparser</code> files are created under the directory of <code>zpar/dist/chinese.depparser</code>.
</p>
<h2><a id="Format-of-inputs-and-outputs">Format of inputs and outputs</a></h2>
<p>
The input file to the <code>train</code> executable contains a set of parse trees. An example parse tree is as follows:
</p>
<pre><code>
Ms. NNP 1
Haag NNP 2
plays VBZ -1
Elianti NNP 2
. . 2
</code></pre>
<p>
Here the first column represents the words of the sentence; the second
column contains POS tags of the words; the third column represents the
indices of the heads of the words. Indices start from 0. For example,
the head index of the word <code>Ms.</code> is 1, which means its head is <code>Haag</code>.
The head index for the root word of the sentences is -1. Note that, in
each line tab characters are used to separate a word, a POS tag, and an
index.
</p>
<p>
The input file to the <code>depparser</code> executable contains POS tagged sentences. The formats for English and Chinese are different.
</p>
<strong>English</strong>:
<br>
<pre><code> Ms./NNP Haag/NNP plays/VBZ Elianti/NNP</code></pre>
<strong>Chinese</strong>:
<br>
<pre><code> ZPar_NR 可以_MD 分析_VV 中文_NN 和_CC 英文_NN </code></pre>
<p>
For Chinese, inputs to both <code>train</code> and <code>depparser</code> must be encoded in <code>utf8</code>.
</p>
<h2><a id="How-to-train-a-model">How to train a model</a></h2>
<p>
To train a model, use
</p>
<pre><code> zpar/dist/english.depparser/train &lt;train-file&gt; &lt;model-file&gt; &lt;number of iterations&gt; </code></pre>
<p>
For example, using the English <a href="http://people.sutd.edu.sg/%7Eyue_zhang/doc/doc/eng_dep_files/train.txt">example train file</a>, you can train a model by
</p>
<pre><code> zpar/dist/english.depparser/train train.txt model 1 </code></pre>
<p>
After training is completed, a new file <code>model</code> will be
created in the current directory, which can be used to parse POS-tagged
sentences. The above command performs training with one iteration (see <a href="#tuning">How to tune the performance of a system</a>) using the training file.
</p>
<p>
The commands for training Chinese parsing models are the same. For example, using the Chinese <a href="http://people.sutd.edu.sg/%7Eyue_zhang/doc/doc/chn_dep_files/train.txt">example train file</a>, you can train a model by
</p>
<pre><code> zpar/dist/chinese.depparser/train train.txt model 1 </code></pre>
<h2><a id="How-to-parse-new-texts">How to parse new texts</a></h2>
<p>
To apply an existing model to parse new texts, use
</p>
<pre><code> zpar/dist/english.depparser/depparser &lt;model&gt; &lt;input-file&gt; &lt;output-file&gt;</code></pre>
<p>
For example, using the model we just trained, we can parse <a href="http://people.sutd.edu.sg/%7Eyue_zhang/doc/doc/eng_dep_files/input.txt">an example input</a> by
</p>
<pre><code> zpar/dist/english.depparser/depparser model input.txt output.txt</code></pre>
<p>
The output file contains automatically parsed trees. The commands for parsing Chinese texts are the same. See an example of <a href="http://people.sutd.edu.sg/%7Eyue_zhang/doc/doc/chn_dep_files/input.txt">Chinese input file</a>.
</p>
<h2><a id="Outputs-and-evaluation">Outputs and evaluation</a></h2>
<p>
In order to evaluate the quality of the outputs, we can manually specify
the gold parse trees of a sample, and compare the outputs with the
correct sample.
</p>
<p>
Manually specified parse trees of the input file are given in <a href="http://people.sutd.edu.sg/%7Eyue_zhang/doc/doc/eng_dep_files/reference.txt">this example reference file</a> (find a Chinese reference file <a href="http://people.sutd.edu.sg/%7Eyue_zhang/doc/doc/chn_dep_files/reference.txt">here</a>). Here is a <a href="http://people.sutd.edu.sg/%7Eyue_zhang/doc/doc/eng_dep_files/evaluate.py">Python script</a> that performs automatic evaluation.
</p>
<p>
Using the above <code>output.txt</code> and <code>reference.txt</code>, we can evaluate the accuracies by typing
</p>
<pre><code> python evaluate.py output.txt reference.txt</code></pre>
<p>
You can find the precision, recall, and f-score here. See the explanation of these measures on <a href="http://en.wikipedia.org/wiki/Precision_and_recall">Wikipedia</a>.
</p>
<h2><a id="tuning">How to tune the performance of a system</a></h2>
<p>
The performance of the system after one training iteration may not be
optimal. You can try training a model for another few iterations, after
each you compare the performance. You can choose the model that gives
the highest f-score on your test data. We conventionally call this test
file the development test data, because you develop a parsing model
using this. Here is a <a href="http://people.sutd.edu.sg/%7Eyue_zhang/doc/doc/eng_dep_files/test.sh.txt">a shell script</a>
that automatically trains the parser for 30 iterations, and after the
ith iteration, stores the model file to model.i. You can compare the
f-score of all 30 iterations and choose model.k, which gives the best
f-score, as the final model. In this file, this is a variable called <code>zpar</code>. You need to set this variable to the relative directory of <code>zpar/dist/english.depparser</code> or <code>zpar/dist/chinese.depparser</code>.
</p>
<h2><a id="Source-code">Source code</a></h2>
<p>
The source code for the English dependency parser can be found at
</p>
<pre><code> zpar/src/common/depparser/implementation/ENGLISH_DEPPARSER_IMPL</code></pre>
<p>
where <code>ENGLISH_DEPPARSER_IMPL</code> is a macro defined in <code>Makefile</code>, and specifies a specific implementation for the English dependency parser.
</p>
<p>
The source code for the Chinese dependency parser can be found at
</p>
<pre><code> zpar/src/common/depparser/implementation/CHINESE_DEPPARSER_IMPL</code></pre>
<p>
where <code>CHINESE_DEPPARSER_IMPL</code> is a macro defined in <code>Makefile</code>, and specifies a specific implementation for the Chinese dependency parser.
</p>
<h2><a id="reference">Reference</a></h2>
<ul>
<li>
Yue Zhang and Stephen Clark. 2008. A Tale of Two Parsers: Investigating
and Combining Graph-based And transition-based Dependency Parsing Using
Beam-search. In <em>Proc. of EMNLP</em>, pages 562-571.
</li>
<li>
Yue Zhang and Joakim Nivre. 2011. Transition-based Dependency Parsing with Rich Non-local Features. In <em>Proc. of ACL</em>, pages 188-193.
</li>
</ul>
</div>
</div>
<footer class="text-center">
<p>
ZPar Release 0.7
</p>
</footer>
</body></html>
You can’t perform that action at this time.