Skip to content
/ esalib Public
forked from ticcky/esalib

My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task.

License

Notifications You must be signed in to change notification settings

xnlp/esalib

 
 

Repository files navigation

My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task. 

== WARNING ==
The tool is verified to yield good results (meaning correlation with human judgement as reported in the original ESA paper) with the provided prebuilt English Wikipedia ESA background from 2005. I have not had success building the ESA background from the recent dumps of Wikipedia. Please let me know if you manage.

== Changelog ==
7.12.2013
 - fixed a few mistakes in the tutorial
 - merged pull request fixing a problem on MacOS

15.2.2013
 - found out about problem with stemming - the example english background is stemmed by PorterStemmer, but my library uses SnowballStemmer; this results in a lot of OOV words and therefore low similarity scores
 - added interactive mode to the analyzer - now you can pipe-in pairs of texts to compare (1 line = 1 text) and ESAAnalyzer produces the similarity scores
 - added wikixray scripts that were missing from the tutorial

7.10.2012 
 - fixed a typo in analyzer bash script, causing only the first words to be analyzed; fixed handling of oov words; removed length filter (only words 3-100 chars long were considered)

29.9.2012
 - added support for SQLite, so that the library is better usable for fast prototyping

25.3.2012
 - initial release

== Files ==
 - /example - see example data in /example where you can find an ESA background built from Wikipedia snapshot from 2005, and directly use it in our tools for assessing semantic similarity of English textis/words.

 - /tutorial - basic instructions for building your own background

 - /lib - Java libraries required to run


== So how to get ESA running in 2 minutes for English? ==
0.
        # git co https://github.com/ticcky/esalib.git
        # cd esalib

1. Create a symbolic link to the sample database
        # ln -s example/esa_en.db esa_db.db

2. Get relatedness estimate of two texts:
        # ./run_analyzer "computer" "apple"


Please don't hessitate to get in touch if you want to use my library but have troubles with it.

About

My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 79.8%
  • C 9.7%
  • Python 8.5%
  • TeX 1.7%
  • Other 0.3%