A Python project inspired by the research of Chloé Kiddon and Yuriy Brun. Paper available here.
Part of the Funniest Computer Ever Open Source Initiative
This code supported Zarquon Squelchmama III in the chatbotbattles contest. We didn't come anywhere near close to winning (our coding was all a bit last minute) but we did get one TWSS in there:
Judge: Hello Zarquon
Judge: How are you?
Zarquon: How do you suppose?
Judge: I suppose you are good but I don't know
Zarquon: That's what she said!
Judge: Hehe. Very funny
libsvm with python bindings required: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Apply patch to allow svm_predict to produce quiet output
cp svmutil.patch LIBSVM_HOME/python & cd LIBSVM_HOME/python & patch < svmutil.patch
[N.B. here's how I add libsvm to my PYTHONPATH: export PYTHONPATH="/Users/samueljoseph/Code/libsvm-3.12/python/:$PYTHONPATH"]
Download TWSS source data into data directory in current project
You can run some limited unit tests like so
python preprocessData.pyto tokenise the files and create a shared vocabulary which is saved in data/vocab.txt. The resulting vector contains about 20k words.
preprocessDatawill also split sentences and save the results in pickle files.
python generateTrainTestData.pyto create a training data set saved in data/train.pk and data/test.pk which are in the form of a array of dictionaries X and vector y, where X is training instances x features, and y is length #training-instances, and is 1 for TWSS and -1 for non-TWSS instances
python train.pyfrom the command line THIS MAY TAKE A FEW MINUTES
python twss.py "<insert your sentence>"to have a little chat with the resulting system
Code is MIT Licence. Data is released under its own licence.