Permalink
Browse files

Second commit

  • Loading branch information...
vpekar committed Sep 14, 2012
1 parent a06e481 commit b943786716d2a044842f5a0a72cef7b885153403
Showing with 56 additions and 1 deletion.
  1. +53 −1 README
  2. +3 −0 stanford.py
View
54 README
@@ -1 +1,53 @@
-test.
+A Jython interface to the Stanford parser. Includes various utilities to manipulate
+parsed sentences:
+* parsing text containing XML tags,
+* obtaining probabilities for different analyses,
+* extracting dependency relations,
+* extracting subtrees,
+* finding the shortest path between two nodes,
+* print the parse in various formats.
+
+See examples after the if __name__ == "__main__" hooks.
+
+
+INSTALLATION:
+
+ 1. Download the parser from http://nlp.stanford.edu/downloads/lex-parser.shtml
+ 2. Unpack into a local dir, put the path to stanford-parser.jar in the -cp arg in jython.bat
+ 3. Put the path to englishPCFG.ser.gz as parser_file arg to StanfordParser
+
+USAGE:
+
+1. Produce an FDG-style of a parse (a table as a list of words with tags):
+
+ parser = StanfordParser()
+
+ To keep XML tags provided in the input text:
+
+ sentence = parser.parse('This is a test')
+
+ To strip all XML before parsing:
+
+ sentence = parser.parse_xml('This is a <b>test</b>.')
+
+ To print the sentence as a table (one word per line):
+
+ sentence.print_table()
+
+ To print the sentence as a parse tree:
+
+ sentence.print_tree()
+
+2. Retrieve the 5 best parses with associated probabilities for the last-parsed sentence:
+
+ parser = StanfordParser()
+ sentence = parser.parse('This is a test')
+ for candidate_tree in parser.lp.getKBestPCFGParses(5):
+ print 'Prob:', math.e**candidate_tree.score()
+ print 'Tree:'
+ s = Sentence(parser.gsf, candidate_tree.object())
+ s.print_table()
+
+On input, the script accepts unicode or utf8 or latin1.
+On output, the script produces unicode.
+
View
@@ -52,6 +52,9 @@
On output, the script produces unicode.
"""
+__author__="Viktor Pekar <v.pekar@gmail.com>"
+__version__="0.1"
+
import sys, re, string, math
try:

0 comments on commit b943786

Please sign in to comment.