Skip to content
This repository has been archived by the owner on Oct 25, 2021. It is now read-only.

vu3jej/scrapy-corenlp

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

scrapy-corenlp

PyPI PyPI

A Scrapy middleware to perform Named Entity Recognition (NER) on response with Stanford CoreNLP.

Settings

Option Value Example Value
STANFORD_NER_ENABLED Boolean True
STANFORD_NER_CLASSIFIER absolute path to CRFClassifier '/home/jithesh/stanford-ner-2015-12-09/classifiers/english.muc.7class.distsim.crf.ser.gz'
STANFORD_NER_JAR absolute path to stanford-ner.jar file '/home/jithesh/stanford-ner-2015-12-09/stanford-ner.jar'
STANFORD_NER_FIELD_TO_PROCESS A field or list of Item text fields to use for classification ['title', 'description']
STANFORD_NER_FIELD_OUTPUT scrapy item field to update the result with 'result'

In your settings.py file, add the previously described settings and add CoreNLP to your SPIDER_MIDDLEWARES, e.g.

SPIDER_MIDDLEWARES = {
    'scrapy_corenlp.middlewares.CoreNLP': 543,
}

An example value of the STANFORD_NER_FIELD_OUTPUT field after recognising the entities is:

{"result": {"DATE": ["1963", "2009", "1979", "1663", "1982"], "ORGANIZATION": ["Royal Society", "US National Academy of Science", "University of California", "Home Home About Stephen The Computer Stephen", "the University of Cambridge", "Sally Tsui Wong-Avery Director of Research", "Theoretical Physics", "Leiden University", "Baby Universe", "Department of Applied Mathematics", "Cambridge Lectures Publications Books Images Films", "Briefer History of Time", "ESA", "NASA", "Brief History of Time", "CBE", "Caius College", "The Universe"], "PERSON": ["P. Oesch", "Einstein", "D. Magee", "Stephen Hawking", "George", "Annie", "Isaac Newton", "G. Illingworth", "Dennis Stanton Avery", "R. Bouwens"], "LOCATION": ["London", "Santa Cruz", "Einstein", "Cambridge", "Gonville"]}}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages