Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Java C Python TeX Other
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task. == WARNING == The tool is verified to yield good results (meaning correlation with human judgement as reported in the original ESA paper) with the provided prebuilt English Wikipedia ESA background from 2005. I have not had success building the ESA background from the recent dumps of Wikipedia. Please let me know if you manage. == Changelog == 7.12.2013 - fixed a few mistakes in the tutorial - merged pull request fixing a problem on MacOS 15.2.2013 - found out about problem with stemming - the example english background is stemmed by PorterStemmer, but my library uses SnowballStemmer; this results in a lot of OOV words and therefore low similarity scores - added interactive mode to the analyzer - now you can pipe-in pairs of texts to compare (1 line = 1 text) and ESAAnalyzer produces the similarity scores - added wikixray scripts that were missing from the tutorial 7.10.2012 - fixed a typo in analyzer bash script, causing only the first words to be analyzed; fixed handling of oov words; removed length filter (only words 3-100 chars long were considered) 29.9.2012 - added support for SQLite, so that the library is better usable for fast prototyping 25.3.2012 - initial release == Files == - /example - see example data in /example where you can find an ESA background built from Wikipedia snapshot from 2005, and directly use it in our tools for assessing semantic similarity of English textis/words. - /tutorial - basic instructions for building your own background - /lib - Java libraries required to run == So how to get ESA running in 2 minutes for English? == 0. # git co https://github.com/ticcky/esalib.git # cd esalib 1. Create a symbolic link to the sample database # ln -s example/esa_en.db esa_db.db 2. Get relatedness estimate of two texts: # ./run_analyzer "computer" "apple" Please don't hessitate to get in touch if you want to use my library but have troubles with it.