Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A Jython interface to the Stanford parser. Includes various utilities to manipulate parsed sentences.
Python
tree: 78ea9014f9

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
README
stanford.py

README

A Jython interface to the Stanford parser. Includes various utilities to manipulate
parsed sentences: 
* parsing text containing XML tags, 
* obtaining probabilities for different analyses,
* extracting dependency relations,
* extracting subtrees, 
* finding the shortest path between two nodes, 
* print the parse in various formats.

See examples after the if __name__ == "__main__" hooks.


INSTALLATION:

    1. Download the parser from http://nlp.stanford.edu/downloads/lex-parser.shtml
    2. Unpack into a local dir, put the path to stanford-parser.jar in the -cp arg in jython.bat
    3. Put the path to englishPCFG.ser.gz as parser_file arg to StanfordParser

USAGE: 

1. Produce an FDG-style of a parse (a table as a list of words with tags):

        parser = StanfordParser()

    To keep XML tags provided in the input text:
    
        sentence = parser.parse('This is a test')
    
    To strip all XML before parsing:
    
        sentence = parser.parse_xml('This is a <b>test</b>.')
    
    To print the sentence as a table (one word per line):
    
        sentence.print_table()
    
    To print the sentence as a parse tree:
    
        sentence.print_tree()
    
2. Retrieve the 5 best parses with associated probabilities for the last-parsed sentence:

    parser = StanfordParser()
    sentence = parser.parse('This is a test')
    for candidate_tree in parser.lp.getKBestPCFGParses(5):
        print 'Prob:', math.e**candidate_tree.score()
        print 'Tree:'
        s = Sentence(parser.gsf, candidate_tree.object())
        s.print_table()

On input, the script accepts unicode or utf8 or latin1.
On output, the script produces unicode.

Something went wrong with that request. Please try again.