# Simple libFM example

- example for 2 users and 3 items
- test on the same 2 users, but now have 4 items (the same 3 from training + one new)

## Data schema
- Users:
  - 0 is User1
  - 1 is User2
- Items:
  - 2 is Item1
  - 3 is Item2
  - 4 is Item3
  - 5 is Item4
- The categorical feature age:
  - 6 is the category “18-25”
  - 7 is the category “26-40”
  - 8 is the category “40-60”
- The numerical feature price:
  - 9 will represent the price feature

One sample can be:

    5 0:1 3:1 6:1 9:20
    #User1 who is 23yo is giving a rating of 5 on Item2 which costs 20 euros


## Training data
train.libfm

    5 0:1 2:1 6:1 9:12.5
    5 0:1 3:1 6:1 9:20
    4 0:1 4:1 6:1 9:78
    1 1:1 2:1 8:1 9:12.5
    1 1:1 3:1 8:1 9:20

num_features = 10  #Computed on the highest integer value that represents a feature (here 9 for the Item price) + 1 (because we expect people to start at 0)

test.libfm

    0 1:1 4:1 8:1 9:78 
    #Here User2 who is 41yo is rating Item3 which costs 78 euros and we gave a rating of 0 because we don’t know yet the real rating
    0 0:1 5:1 6:1
    #We want to know which rating User1 who is 23yo will give to a not-yet seen Item4 and we don’t know the price

For the test, I have two samples I want prediction. The 0 doesn’t really have any effect in testing (Only useful if you have the true value, then libFM will output the RMSE error on it but will not use it to train the model)

## Run

    ./libfm -task r -method mcmc -train train.libfm -test test.libfm -iter 10 -dim ‘1,1,2’ -out output.libfm

So the model was train using [MCMC (-method mcmc)] on [10 (-iter 10)] iterations using a [linear model (+bias) and using factorization with 2 latent factors. (-dim ‘1,1,2’)]

Prediction will be written in the file ‘output.libfm’

In [2]:
%cat data/libfm/train.libfm

5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20

In [3]:
%cat data/libfm/test.libfm

0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1

In [5]:
bin_dir = '/Users/zhangjun/Documents/libfm-1.42.src/bin'
%ls $bin_dir

[31mconvert[m[m*   [31mlibFM[m[m*     [31mtranspose[m[m*


In [7]:
import os
script_dir = os.path.join(bin_dir, "libFM")
%ls $script_dir

[31m/Users/zhangjun/Documents/libfm-1.42.src/bin/libFM[m[m*


In [9]:
import sys
from subprocess import call
try:
    command_str = \
        "%s -task r -method mcmc -train data/libfm/train.libfm -test data/libfm/test.libfm -iter 10 -dim '1,1,2' -out data/libfm/output.libfm" \
        % script_dir
    command_list = command_str.split(' ')
    retcode = call(command_list)
    if retcode < 0:
        print >>sys.stderr, "Child was terminated by signal", -retcode
    else:
        print >>sys.stderr, "Child returned", retcode
except OSError as e:
    print >>sys.stderr, "Execution failed:", e

Child returned 0


In [10]:
%cat data/libfm/output.libfm

1.97353
1.00468
