TWSS

A Python project inspired by the research of Chloé Kiddon and Yuriy Brun. Paper available here.

Thanks to Henry Garner for original coding initiative, to Marcin Tolysz for awesome super-fast server modifications, and to Andrew Ng for the ML course

Part of the Funniest Computer Ever Open Source Initiative

This code supported Zarquon Squelchmama III in the chatbotbattles contest. We didn't come anywhere near close to winning (our coding was all a bit last minute) but we did get one TWSS in there:

Judge: Hello Zarquon

Zarquon: Greetings.

Judge: How are you?

Zarquon: How do you suppose?

Judge: I suppose you are good but I don't know

Zarquon: That's what she said!

Judge: Hehe. Very funny

Note we now have some other chatbot related initiatives in the faq and worldmodel directories. Ultimately hoping these will all be integrated into a single coherent whole :-)

Getting started

libsvm with python bindings required: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Apply patch to allow svm_predict to produce quiet output cp svmutil.patch LIBSVM_HOME/python & cd LIBSVM_HOME/python & patch < svmutil.patch

[N.B. here's how I add libsvm to my PYTHONPATH: export PYTHONPATH="/Users/samueljoseph/Code/libsvm-3.12/python/:$PYTHONPATH"]

Download TWSS source data into data directory in current project
You can run some limited unit tests like so python testTokeniseContents.py
Run python preprocessData.py to tokenise the files and create a shared vocabulary which is saved in data/vocab.txt. The resulting vector contains about 20k words. preprocessData will also split sentences and save the results in pickle files.
Run python generateTrainTestData.py to create a training data set saved in data/train.pk and data/test.pk which are in the form of a array of dictionaries X and vector y, where X is training instances x features, and y is length #training-instances, and is 1 for TWSS and -1 for non-TWSS instances
Run python train.py from the command line THIS MAY TAKE A FEW MINUTES
Run python twss.py "<insert your sentence>" to have a little chat with the resulting system

Licence

Code is MIT Licence. Data is released under its own licence.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
bot		bot
data		data
faq		faq
nao		nao
world_model		world_model
.gitignore		.gitignore
README.markdown		README.markdown
comedy_faq.py		comedy_faq.py
generateTrainTestData.py		generateTrainTestData.py
getVocabList.py		getVocabList.py
joke.py		joke.py
manage.py		manage.py
preprocessData.py		preprocessData.py
processSentence.py		processSentence.py
requirements.txt		requirements.txt
sentenceFeatures.py		sentenceFeatures.py
svmutil.patch		svmutil.patch
testTokeniseContents.py		testTokeniseContents.py
tokeniseContents.py		tokeniseContents.py
train.py		train.py
trial-faq.py		trial-faq.py
twss.py		twss.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TWSS

Getting started

Licence

About

Releases

Packages

Contributors 6

Languages

tansaku/twss

Folders and files

Latest commit

History

Repository files navigation

TWSS

Getting started

Licence

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages