Add clean_words script and update readme

latest commit 1ef298c32a
Zac Stewart authored