Movie script genre classifier.
Install the module with: npm install talkkie
var talkkie = require('talkkie');
var classifier = new talkkie.Classifier();
Pass in multiple scripts to traing the classifier.
var script1 = '...';
var script2 = '...';
var script3 = '...';
classifier.train(script1, 'Action');
classifier.train(script1, 'Comedy');
classifier.train(script1, 'Horror');
Classify a script. Optionally pass in an options object.
var unkown = '...';
var genre = classifier.classify(unknown);
// -> 'Action'
Export tarining data as JSON for later use.
classifier.export('/Desktop/training.json')
- If a path is not specified training.json will be saved to lib/training
Import training JSON.
classifier.import('/Desktop/training.json')
- If a path is not specified it will look for training.json in lib/training
See /examples folder.
We use the simple Naive Bayes' theorem to classify movie scripts into genres. This works by essentially counting word occurences in each genre of a training data set. We then compute the probabilities for each word w.r.t each genre. To classify a script we again simply add up all of the previously calculated probabalities w.r.t each genre. The genre with the highest total wins.
(Some notes):
- We automatically ignore a list of 100 most common words found in English when counting word occurences.
- A pre-generated training.json is included in lib/training
- A bunch of training scripts can be found here
- Because scripts are classified under multiple genres the detailed output is helpful
In lieu of a formal styleguide, take care to maintain the existing coding style. Add unit tests for any new or changed functionality. Lint and test your code using Grunt.
(Nothing yet)
Copyright (c) 2014 Tim Kendall, Ian Jackson. Licensed under the MIT license.