Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task. == WARNING == The tool is verified to yield good results (meaning correlation with human judgement as reported in the original ESA paper) with the provided prebuilt English Wikipedia ESA background from 2005. I have not had success building the ESA background from the recent dumps of Wikipedia. Please let me know if you manage. == Changelog == 7.12.2013 - fixed a few mistakes in the tutorial - merged pull request fixing a problem on MacOS 15.2.2013 - found out about problem with stemming - the example english background is stemmed by PorterStemmer, but my library uses SnowballStemmer; this results in a lot of OOV words and therefore low similarity scores - added interactive mode to the analyzer - now you can pipe-in pairs of texts to compare (1 line = 1 text) and ESAAnalyzer produces the similarity scores - added wikixray scripts that were missing from the tutorial 7.10.2012 - fixed a typo in analyzer bash script, causing only the first words to be analyzed; fixed handling of oov words; removed length filter (only words 3-100 chars long were considered) 29.9.2012 - added support for SQLite, so that the library is better usable for fast prototyping 25.3.2012 - initial release == Files == - /example - see example data in /example where you can find an ESA background built from Wikipedia snapshot from 2005, and directly use it in our tools for assessing semantic similarity of English textis/words. - /tutorial - basic instructions for building your own background - /lib - Java libraries required to run == So how to get ESA running in 2 minutes for English? == 0. # git co https://github.com/ticcky/esalib.git # cd esalib 1. Create a symbolic link to the sample database # ln -s example/esa_en.db esa_db.db 2. Get relatedness estimate of two texts: # ./run_analyzer "computer" "apple" Please don't hessitate to get in touch if you want to use my library but have troubles with it.
No releases published