SVMKit is a machine learninig library in Ruby
SVMKit is a machine learninig library in Ruby. SVMKit provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. SVMKit currently supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Random Forest, K-nearest neighbor classifier, K-Means, DBSCAN, Principal Component Analysis, Non-negative Matrix Factorization and cross-validation.


Add this line to your application's Gemfile:

gem 'svmkit'

And then execute:

$ bundle

Or install it yourself as:

$ gem install svmkit


Example 1. Pendigits dataset classification

SVMKit provides function loading libsvm format dataset file. We start by downloading the pendigits dataset from LIBSVM Data web site.

$ wget
$ wget

Training of the classifier with Linear SVM and RBF kernel feature map is the following code.

require 'svmkit'

# Load the training dataset.
samples, labels = SVMKit::Dataset.load_libsvm_file('pendigits')

# If the features consists only of integers, load_libsvm_file method reads in Numo::Int32 format.
# As necessary, you should convert sample array to Numo::DFloat format.
samples = Numo::DFloat.cast(samples)

# Map training data to RBF kernel feature space.
transformer = 0.0001, n_components: 1024, random_seed: 1)
transformed = transformer.fit_transform(samples)

# Train linear SVM classifier.
classifier = 0.0001, max_iter: 1000, batch_size: 50, random_seed: 1), labels)

# Save the model.'transformer.dat', 'wb') { |f| f.write(Marshal.dump(transformer)) }'classifier.dat', 'wb') { |f| f.write(Marshal.dump(classifier)) }

Classifying testing data with the trained classifier is the following code.

require 'svmkit'

# Load the testing dataset.
samples, labels = SVMKit::Dataset.load_libsvm_file('pendigits.t')
samples = Numo::DFloat.cast(samples)

# Load the model.
transformer = Marshal.load(File.binread('transformer.dat'))
classifier = Marshal.load(File.binread('classifier.dat'))

# Map testing data to RBF kernel feature space.
transformed = transformer.transform(samples)

# Classify the testing data and evaluate prediction results.
puts("Accuracy: %.1f%%" % (100.0 * classifier.score(transformed, labels)))

# Other evaluating approach
# results = classifier.predict(transformed)
# evaluator =
# puts("Accuracy: %.1f%%" % (100.0 * evaluator.score(results, labels)))

Execution of the above scripts result in the following.

$ ruby train.rb
$ ruby test.rb
Accuracy: 98.4%

Example 2. Cross-validation

require 'svmkit'

# Load dataset.
samples, labels = SVMKit::Dataset.load_libsvm_file('pendigits')
samples = Numo::DFloat.cast(samples)

# Define the estimator to be evaluated.
lr = 0.0001, random_seed: 1)

# Define the evaluation measure, splitting strategy, and cross validation.
ev =
kf = 5, shuffle: true, random_seed: 1)
cv = lr, splitter: kf, evaluator: ev)

# Perform 5-cross validation.
report = cv.perform(samples, labels)

# Output result.
mean_logloss = report[:test_score].inject(:+) / kf.n_splits
puts("5-CV mean log-loss: %.3f" % mean_logloss)


