Skip to content

timkendall/talkkie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

talkkie

Movie script genre classifier.

Getting Started

Install the module with: npm install talkkie

var talkkie = require('talkkie');
var classifier = new talkkie.Classifier();

Documentation

Training

Pass in multiple scripts to traing the classifier.

var script1 = '...';
var script2 = '...';
var script3 = '...';

classifier.train(script1, 'Action');
classifier.train(script1, 'Comedy');
classifier.train(script1, 'Horror');

Classifying

Classify a script. Optionally pass in an options object.

var unkown = '...';
var genre = classifier.classify(unknown);

// -> 'Action'

Exporting training data

Export tarining data as JSON for later use.

classifier.export('/Desktop/training.json')
  • If a path is not specified training.json will be saved to lib/training

Import training data

Import training JSON.

classifier.import('/Desktop/training.json')
  • If a path is not specified it will look for training.json in lib/training

Examples

See /examples folder.

How it Works

We use the simple Naive Bayes' theorem to classify movie scripts into genres. This works by essentially counting word occurences in each genre of a training data set. We then compute the probabilities for each word w.r.t each genre. To classify a script we again simply add up all of the previously calculated probabalities w.r.t each genre. The genre with the highest total wins.

(Some notes):

  • We automatically ignore a list of 100 most common words found in English when counting word occurences.
  • A pre-generated training.json is included in lib/training
  • A bunch of training scripts can be found here
  • Because scripts are classified under multiple genres the detailed output is helpful

Contributing

In lieu of a formal styleguide, take care to maintain the existing coding style. Add unit tests for any new or changed functionality. Lint and test your code using Grunt.

Release History

(Nothing yet)

License

Copyright (c) 2014 Tim Kendall, Ian Jackson. Licensed under the MIT license.

About

A movie script genre classifier.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published