In order to use the OpenSubtitles dataset, you must first download and unpack the archive in this folder. The program will automatically look at every subfolders here. Train with this dataset using ./main.py --corpus opensubs
.
Download english corpus directly here: http://opus.lingfil.uu.se/download.php?f=OpenSubtitles/en.tar.gz
All details on the corpus here: http://opus.lingfil.uu.se/OpenSubtitles.php
Note that even if that has not been tested, the program should be compatible with other languages as well. Just download the subtitles from the language you want from the OpenSubtitles database website.