The Lovecraft Corpus
H.P. Lovecraft's works are largely out of copyright and are relatively easy to obtain digitally. However, I was unable to come across a corpus that had every story split out into its own file. This corpus corrects that.
This corpus is very bare bones. It has basically no work done on it and no readymade resources for natural language processes. Those may come later but for now this corpus just contains the stories.