Skip to content

Commit

Permalink
Import r31 from rubyforge
Browse files Browse the repository at this point in the history
  • Loading branch information
xaviershay committed Apr 17, 2008
0 parents commit 72cfa08
Show file tree
Hide file tree
Showing 20 changed files with 1,771 additions and 0 deletions.
429 changes: 429 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

88 changes: 88 additions & 0 deletions README
@@ -0,0 +1,88 @@
== Welcome to Classifier

Classifier is a general module to allow Bayesian and other types of classifications.

== Download

* http://rubyforge.org/projects/classifier
* gem install classifier
* svn co http://rufy.com/svn/classifier/trunk

== Dependencies
If you install Classifier from source, you'll need to install Martin Porter's stemmer algorithm with RubyGems as follows:
gem install stemmer

If you would like to speed up LSI classification by at least 10x, please install the following libraries:
GNU GSL:: http://www.gnu.org/software/gsl
rb-gsl:: http://rb-gsl.rubyforge.org

Notice that LSI will work without these libraries, but as soon as they are installed, Classifier will make use of them. No configuration changes are needed, we like to keep things ridiculously easy for you.

== Bayes
A Bayesian classifier by Lucas Carlson. Bayesian Classifiers are accurate, fast, and have modest memory requirements.

=== Usage
require 'classifier'
b = Classifier::Bayes.new 'Interesting', 'Uninteresting'
b.train_interesting "here are some good words. I hope you love them"
b.train_uninteresting "here are some bad words, I hate you"
b.classify "I hate bad words and you" # returns 'Uninteresting'

require 'madeleine'
m = SnapshotMadeleine.new("bayes_data") {
Classifier::Bayes.new 'Interesting', 'Uninteresting'
}
m.system.train_interesting "here are some good words. I hope you love them"
m.system.train_uninteresting "here are some bad words, I hate you"
m.take_snapshot
m.system.classify "I love you" # returns 'Interesting'

Using Madeleine, your application can persist the learned data over time.

=== Bayesian Classification

* http://www.process.com/precisemail/bayesian_filtering.htm
* http://en.wikipedia.org/wiki/Bayesian_filtering
* http://www.paulgraham.com/spam.html

== LSI
A Latent Semantic Indexer by David Fayram. Latent Semantic Indexing engines
are not as fast or as small as Bayesian classifiers, but are more flexible, providing
fast search and clustering detection as well as semantic analysis of the text that
theoretically simulates human learning.

=== Usage
require 'classifier'
lsi = Classifier::LSI.new
strings = [ ["This text deals with dogs. Dogs.", :dog],
["This text involves dogs too. Dogs! ", :dog],
["This text revolves around cats. Cats.", :cat],
["This text also involves cats. Cats!", :cat],
["This text involves birds. Birds.",:bird ]]
strings.each {|x| lsi.add_item x.first, x.last}

lsi.search("dog", 3)
# returns => ["This text deals with dogs. Dogs.", "This text involves dogs too. Dogs! ",
# "This text also involves cats. Cats!"]

lsi.find_related(strings[2], 2)
# returns => ["This text revolves around cats. Cats.", "This text also involves cats. Cats!"]

lsi.classify "This text is also about dogs!"
# returns => :dog

Please see the Classifier::LSI documentation for more information. It is possible to index, search and classify
with more than just simple strings.

=== Latent Semantic Indexing
* http://www.c2.com/cgi/wiki?LatentSemanticIndexing
* http://www.chadfowler.com/index.cgi/Computing/LatentSemanticIndexing.rdoc
* http://en.wikipedia.org/wiki/Latent_semantic_analysis

== Authors
* Lucas Carlson (mailto:lucas@rufy.com)
* David Fayram II (mailto:dfayram@gmail.com)
* Cameron McBride (mailto:cameron.mcbride@gmail.com)

This library is released under the terms of the GNU LGPL. See LICENSE for more details.

96 changes: 96 additions & 0 deletions Rakefile
@@ -0,0 +1,96 @@
require 'rubygems'
require 'rake'
require 'rake/testtask'
require 'rake/rdoctask'
require 'rake/gempackagetask'
require 'rake/contrib/rubyforgepublisher'

PKG_VERSION = "1.3.1"

PKG_FILES = FileList[
"lib/**/*", "bin/*", "test/**/*", "[A-Z]*", "Rakefile", "html/**/*"
]

desc "Default Task"
task :default => [ :test ]

# Run the unit tests
desc "Run all unit tests"
Rake::TestTask.new("test") { |t|
t.libs << "lib"
t.pattern = 'test/*/*_test.rb'
t.verbose = true
}

# Make a console, useful when working on tests
desc "Generate a test console"
task :console do
verbose( false ) { sh "irb -I lib/ -r 'classifier'" }
end

# Genereate the RDoc documentation
desc "Create documentation"
Rake::RDocTask.new("doc") { |rdoc|
rdoc.title = "Ruby Classifier - Bayesian and LSI classification library"
rdoc.rdoc_dir = 'html'
rdoc.rdoc_files.include('README')
rdoc.rdoc_files.include('lib/**/*.rb')
}

# Genereate the package
spec = Gem::Specification.new do |s|

#### Basic information.

s.name = 'classifier'
s.version = PKG_VERSION
s.summary = <<-EOF
A general classifier module to allow Bayesian and other types of classifications.
EOF
s.description = <<-EOF
A general classifier module to allow Bayesian and other types of classifications.
EOF

#### Which files are to be included in this gem? Everything! (Except CVS directories.)

s.files = PKG_FILES

#### Load-time details: library and application (you will need one or both).

s.require_path = 'lib'
s.autorequire = 'classifier'

#### Documentation and testing.

s.has_rdoc = true

#### Dependencies and requirements.

s.add_dependency('stemmer', '>= 1.0.0')
s.requirements << "A porter-stemmer module to split word stems."

#### Author and project details.
s.author = "Lucas Carlson"
s.email = "lucas@rufy.com"
s.homepage = "http://classifier.rufy.com/"
end

Rake::GemPackageTask.new(spec) do |pkg|
pkg.need_zip = true
pkg.need_tar = true
end

desc "Report code statistics (KLOCs, etc) from the application"
task :stats do
require 'code_statistics'
CodeStatistics.new(
["Library", "lib"],
["Units", "test"]
).to_s
end

desc "Publish new documentation"
task :publish do
`ssh rufy update-classifier-doc`
Rake::RubyForgePublisher.new('classifier', 'cardmagic').upload
end
36 changes: 36 additions & 0 deletions bin/bayes.rb
@@ -0,0 +1,36 @@
#!/usr/bin/env ruby

begin
require 'rubygems'
require 'classifier'
rescue
require 'classifier'
end

require 'madeleine'

m = SnapshotMadeleine.new(File.expand_path("~/.bayes_data")) {
Classifier::Bayes.new 'Interesting', 'Uninteresting'
}

case ARGV[0]
when "add"
case ARGV[1].downcase
when "interesting"
m.system.train_interesting File.open(ARGV[2]).read
puts "#{ARGV[2]} has been classified as interesting"
when "uninteresting"
m.system.train_uninteresting File.open(ARGV[2]).read
puts "#{ARGV[2]} has been classified as uninteresting"
else
puts "Invalid category: choose between interesting and uninteresting"
exit(1)
end
when "classify"
puts m.system.classify(File.open(ARGV[1]).read)
else
puts "Invalid option: choose add [category] [file] or clasify [file]"
exit(-1)
end

m.take_snapshot
16 changes: 16 additions & 0 deletions bin/summarize.rb
@@ -0,0 +1,16 @@
#!/usr/bin/env ruby

begin
require 'rubygems'
require 'classifier'
rescue
require 'classifier'
end

require 'open-uri'

num = ARGV[1].to_i
num = num < 1 ? 10 : num

text = open(ARGV.first).read
puts text.gsub(/<[^>]+>/,"").gsub(/[\s]+/," ").summary(num)
50 changes: 50 additions & 0 deletions install.rb
@@ -0,0 +1,50 @@
require 'rbconfig'
require 'find'
require 'ftools'

include Config

# this was adapted from rdoc's install.rb by ways of Log4r

$sitedir = CONFIG["sitelibdir"]
unless $sitedir
version = CONFIG["MAJOR"] + "." + CONFIG["MINOR"]
$libdir = File.join(CONFIG["libdir"], "ruby", version)
$sitedir = $:.find {|x| x =~ /site_ruby/ }
if !$sitedir
$sitedir = File.join($libdir, "site_ruby")
elsif $sitedir !~ Regexp.quote(version)
$sitedir = File.join($sitedir, version)
end
end

makedirs = %w{ classifier }
makedirs = %w{ classifier/extensions }
makedirs = %w{ classifier/lsi }
makedirs.each {|f| File::makedirs(File.join($sitedir, *f.split(/\//)))}

Dir.chdir("lib")
begin
require 'rubygems'
require 'rake'
rescue LoadError
puts
puts "Please install Gem and Rake from http://rubyforge.org/projects/rubygems and http://rubyforge.org/projects/rake"
puts
exit(-1)
end

files = FileList["**/*"]

# File::safe_unlink *deprecated.collect{|f| File.join($sitedir, f.split(/\//))}
files.each {|f|
File::install(f, File.join($sitedir, *f.split(/\//)), 0644, true)
}

begin
require 'stemmer'
rescue LoadError
puts
puts "Please install Stemmer from http://rubyforge.org/projects/stemmer or via 'gem install stemmer'"
puts
end
30 changes: 30 additions & 0 deletions lib/classifier.rb
@@ -0,0 +1,30 @@
#--
# Copyright (c) 2005 Lucas Carlson
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#++
# Author:: Lucas Carlson (mailto:lucas@rufy.com)
# Copyright:: Copyright (c) 2005 Lucas Carlson
# License:: LGPL

require 'rubygems'
require 'classifier/extensions/string'
require 'classifier/bayes'
require 'classifier/lsi'

0 comments on commit 72cfa08

Please sign in to comment.