Learning by Reading pipeline of NLP and Entity Linking tools
C Prolog C++ Python JavaScript Lex Other
Switch branches/tags
Nothing to show
Clone or download

README.md

KNEWS: Knowledge Extraction With Semantics

A Learning by Reading pipeline of NLP and Entity Linking tools.

KNEWS is a composite tool that bridges semantic parsing (using C&C tools and Boxer or Semafor), word sense disambiguation (using UKB or Babelfy) and entity linking (using Babelfy or DBpedia Spotlight) to produce a unified, LOD-compliant abstract representation of meaning.

KNEWS can produce several kinds of output:

  1. Frame instances, based on the FrameBase scheme:
<http://framebase.org/ns/fi-Operate_vehicle_0059a98c-3870-49ed-87e1-f882e11a49f7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://framebase.org/ns/frame-Operate_vehicle-drive.v> .
<http://framebase.org/ns/fi-Operate_vehicle_0059a98c-3870-49ed-87e1-f882e11a49f7> <http://framebase.org/ns/fe-Driver> <http://dbpedia.org/resource/Robot> .
<http://framebase.org/ns/fi-Operate_vehicle_0059a98c-3870-49ed-87e1-f882e11a49f7> <http://framebase.org/ns/fe-Vehicle> <http://wordnet-rdf.princeton.edu/wn31/02961779-n> .
  1. Word-aligned semantics, based on lexicalized Discourse Representation Graphs:
<frameinstances>
  <frameinstance id="Operate_vehicle_9a3fa55e-4d97-406a-ab0d-cf681e277296" type="Operate_vehicle-drive.v" internalvariable="e1">
    <framelexicalization>k3:x1 is driving k3:x2</framelexicalization>
    <instancelexicalization>A robot is driving the car</instancelexicalization>
    <frameelements>
      <frameelement role="Driver" internalvariable="x1">
        <concept>http://dbpedia.org/resource/Robot</concept>
        <roleexicalization>A robot is driving x2</roleexicalization>
        <conceptlexicalization/>
      </frameelement>
      <frameelement role="Vehicle" internalvariable="x2">
        <concept>http://wordnet-rdf.princeton.edu/wn31/02961779-n</concept>
        <roleexicalization>x1 is driving the car</roleexicalization>
        <conceptlexicalization/>
      </frameelement>
    </frameelements>
  </frameinstance>
</frameinstances>
  1. First-order logic formulae with WordNet synsets and DBpedia ids as symbols:
fol(1,some(A,and(02961779-n(A),some(B,some(C,and(r1Theme(B,A),and(r1Agent(B,C),and(01934845-v(B),Robot(C))))))))).

Online demo

A demo of KNEWS is now available at http://gingerbeard.alwaysdata.net/knews/.

Installation and configuration

After cloning the repository or otherwise downloaded the KNEWS source code, you must instals the prerequisite Python packages listed in the file requirements.txt. With pip, this is done with:

$ pip install -r requirements.txt

Semantic parsing configuration

KNEWS can work with either Semafor or C&C tools/Boxer to perform semantic parsing. By default Semafor is used, in order to switch to Boxer set semantics->module value to boxer in the config/disambiguation.conf file.

Installation of the Semafor

To install Semafor run:

$ cd ext/
$ ./install_semafor.sh

It is expected that Semafor is run in server mode, server startup instructions can be found in Semafor documentation.

In order to run it locally, open the config/semanticparsing.conf file and switch the value for semafor->mode to local

Installation of the C&C tools and Boxer

Alternatively, C&C tools and Boxer can be used for the semantic parsing. The C&C source code is included in the KNEWS repository (revision v2614). A shell script is provided to automate the compilation and installation. To install the C&C tools locally run

$ cd ext/
$ ./install_candc.sh

By default the script expects to be run on unix/linux. In order to compile on other platforms please modify the install_candc.sh accordingly. For example, on macOS you should change:

ln -s Makefile.unix Makefile to ln -s Makefile.macosx Makefile

Please note: you will need a working installation of swi-prolog v6.6.x in order to correctly compile Boxer.

To test that the installation has completed successfully run (from the candc/ directory):

$ bin/candc --version
$ candc v2614 (unix build on 19 April 2016, 11:35:31)
$ bin/boxer --version
$ boxer v2614 (unix build on 19 April 2016, 11:35:31)

To use the SOAP client/server version of the C&C tools, run the server first with the following command line (from the candc/ directory):

$ bin/soap_server --server localhost:8888 --models models/boxer/ --candc-printer boxer
$ waiting for connections on localhost:8888

Next, you must configure how to run the C&C tools. Open the file config/semanticparsing.conf and select a value for boxer->mode:

  • online will access the [online API]. This is the easiest solution but is it unpractical if KNEWS is used to parse a large amount of text.
  • local will use a local installation of the C&C tools (see below for instructions on how to get this running).
  • soap will usa a local installation of the C&C tools with the SOAP-based client/server architecture, convenient for parsing many different files.

Configuration of the disambiguation tools

You must configure which module to use for word sense disambiguation and entity linking. Open the file config/disambiguation.conf and set a value for wsd->module:

  • babelfy uses the Babelfy online API. Note: a valid API key is needed. You must request it and write it in the config/babelfy.var.properties file.
  • ukb uses the UKB Word Sense Disambiguation system. A script is provided in the ext/ directory to download and install it.
  • lesk uses the Enhanced Lesk WSD algorithm proposed by P. Basile et al. A script is provided in the ext/ directory to download and install it.

You can also configure an entity linking module in the *config/disambiguation.conf file:

  • babelfy uses the Babelfy online API. Note: a valid API key is needed. You must request it and write it in the config/babelfy.var.properties file.
  • spotlight uses the DBpedia Spotlight online API.
  • none makes KNEWS skip the entity linking step altogether.

Test the installation

$ src/pipeline.py -i input.txt -o output.txt

or

$ src/pipeline.py -d input/ -o output.txt