My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task.
Java C Python TeX Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
config
example
lib sqlite support for esa Sep 29, 2012
memesa_builder Update prepare.py Sep 9, 2014
res
scripts
src
tutorial
.classpath
.project
LICENSE
README
esalib.jar
prepare_db
prepare_db.sql
run_analyzer
run_articleindexer
run_esaindexbuilder

README

My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task. 

== WARNING ==
The tool is verified to yield good results (meaning correlation with human judgement as reported in the original ESA paper) with the provided prebuilt English Wikipedia ESA background from 2005. I have not had success building the ESA background from the recent dumps of Wikipedia. Please let me know if you manage.

== Changelog ==
7.12.2013
 - fixed a few mistakes in the tutorial
 - merged pull request fixing a problem on MacOS

15.2.2013
 - found out about problem with stemming - the example english background is stemmed by PorterStemmer, but my library uses SnowballStemmer; this results in a lot of OOV words and therefore low similarity scores
 - added interactive mode to the analyzer - now you can pipe-in pairs of texts to compare (1 line = 1 text) and ESAAnalyzer produces the similarity scores
 - added wikixray scripts that were missing from the tutorial

7.10.2012 
 - fixed a typo in analyzer bash script, causing only the first words to be analyzed; fixed handling of oov words; removed length filter (only words 3-100 chars long were considered)

29.9.2012
 - added support for SQLite, so that the library is better usable for fast prototyping

25.3.2012
 - initial release

== Files ==
 - /example - see example data in /example where you can find an ESA background built from Wikipedia snapshot from 2005, and directly use it in our tools for assessing semantic similarity of English textis/words.

 - /tutorial - basic instructions for building your own background

 - /lib - Java libraries required to run


== So how to get ESA running in 2 minutes for English? ==
0.
        # git co https://github.com/ticcky/esalib.git
        # cd esalib

1. Create a symbolic link to the sample database
        # ln -s example/esa_en.db esa_db.db

2. Get relatedness estimate of two texts:
        # ./run_analyzer "computer" "apple"


Please don't hessitate to get in touch if you want to use my library but have troubles with it.