Skip to content
This repository has been archived by the owner on Apr 16, 2019. It is now read-only.

Commit

Permalink
improve the tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
Bee-Chung Chen committed Jan 5, 2012
1 parent ce44137 commit 378d709
Show file tree
Hide file tree
Showing 4 changed files with 134 additions and 12 deletions.
8 changes: 8 additions & 0 deletions doc/quick-start.tex
@@ -0,0 +1,8 @@

\subsection{Quick Start}
\label{sec:bst-quick-start}

In this section, we describe how to fit latent factor models using this package without much need for familiarity of R.
\begin{center}
(To be completed)
\end{center}
Binary file modified doc/tutorial.pdf
Binary file not shown.
24 changes: 15 additions & 9 deletions doc/tutorial.tex
Expand Up @@ -10,7 +10,7 @@
\author{Bee-Chung Chen}
\maketitle

This document describes how you can fit latent factor models using the open source package developed in Yahoo! Labs.
This paper describes how you can fit latent factor models (e.g., \cite{rlfm:kdd09,bst:kdd11,gmf:recsys11}) using the open source package developed in Yahoo! Labs.

{\small\begin{verbatim}
Stable repository: https://github.com/yahoo/Latent-Factor-Models
Expand All @@ -23,15 +23,18 @@ \section{Preparation}

\subsection{Install R}

To install R, go to {\tt http://www.r-project.org/}. Click CRAN on the left panel. Pick a CRAN mirror. Then, install R from the R source code.
Before installing R, please make sure that you have C/C++ and Fortran compilers (e.g., {\tt gcc} and {\tt gfortran}) installed on your machine.

To install R, go to {\tt http://www.r-project.org/}. Click CRAN on the left panel. Pick a CRAN mirror. Then, install R from the R source code. The fact that you are able to build R from source would ensure that you can compile the C/C++ code in this package.

Alternatively, you can install R using linux's package management software. In this case, please install {\tt r-base}, {\tt r-base-core}, {\tt r-base-dev}, {\tt r-recommended}.

After installing R, install the following R packages: {\tt Matrix} and {\tt glmnet}. Notice that these two packages are not required if you do not need to handle sparse feature vectors or matrices. To install these R packages, use the following commands in R.\\
{\tt
> install.packages("Matrix");\\
After installing R, enter R by simply typing {\tt R} and install the following R packages: {\tt Matrix} and {\tt glmnet}. Notice that these two packages are not required if you do not need to handle sparse feature vectors or matrices. To install these R packages, use the following commands in R.
{\small\begin{verbatim}
> install.packages("Matrix");
> install.packages("glmnet");
}
\end{verbatim}}
\noindent Make sure that you can run R by simply typing {\tt R}. Otherwise, please use alias to point {\tt R} to your R executable file. This is required for {\tt make} to work properly.

\subsection{Be Familiar with R}

Expand All @@ -44,7 +47,7 @@ \subsection{Compile C/C++ Code}

\section{Bias-Smoothed Tensor Model}

The bias-smoothed tensor (BST) model~\cite{bst:kdd11} includes the regression-based latent factor model (RLFM)~\cite{rlfm:kdd09} and regular matrix factorization models as special cases. In fact, the BST model presented here is a bit more general than the model presented in~\cite{bst:kdd11}. In the following, I demonstrate how to fit such a model and its special cases. The R code of this section can be found in {\tt src/R/examples/tutorial-BST.R}.
The bias-smoothed tensor (BST) model~\cite{bst:kdd11} includes the regression-based latent factor model (RLFM)~\cite{rlfm:kdd09} and regular matrix factorization models as special cases. In fact, the BST model presented here is more general than the model presented in~\cite{bst:kdd11}. In the following, I demonstrate how to fit such a model and its special cases. The R code of this section can be found in {\tt src/R/examples/tutorial-BST.R}.

\subsection{Model}

Expand Down Expand Up @@ -173,7 +176,9 @@ \subsection{Toy Dataset}
10 1 -0.91 # 1st feature of line 10 in obs-train.txt =-0.91
\end{verbatim}}

\subsection{Model Fitting}
\input{quick-start}

\subsection{Model Fitting Details}
\label{sec:fitting}

See Example 1 in {\tt src/R/examples/tutorial-BST.R} for the R script. For succinctness, we ignore some R commands in the following description.
Expand Down Expand Up @@ -382,7 +387,8 @@ \subsection{Other Examples}
In {\tt src/R/examples/tutorial-BST.R}, we also provide a number of additional examples.
\begin{itemize}
\item Example 2: In this example, we demonstrate how to fit the same models as those in Example 1 with sparse features and the glmnet algorithm.
\item Example 3: In this example, we demonstrate how to fit RLFM models with sparse features and the glmnet algorithm. Note that RLFM models do not fit this toy dataset well.
\item Example 3: In this example, we demonstrate how to add more EM iterations to an already fitted model.
\item Example 4: In this example, we demonstrate how to fit RLFM models with sparse features and the glmnet algorithm. Note that RLFM models do not fit this toy dataset well.
\end{itemize}


Expand Down
114 changes: 111 additions & 3 deletions src/R/examples/tutorial-BST.R
Expand Up @@ -202,7 +202,7 @@ source("src/R/model/multicontext_model_MStep.R");
source("src/R/model/multicontext_model_EM.R");
source("src/R/model/GLMNet.R");
set.seed(2);
out.dir = "/tmp/unit-test/simulated-mtx-uvw-10K";
out.dir = "/tmp/tutorial-BST/example-2";
ans = run.multicontext(
obs=data.train$obs, # Observation table
feature=data.train$feature, # Features
Expand Down Expand Up @@ -232,7 +232,115 @@ ans = run.multicontext(


###
### Example 3: Fit the RLFM model with sparse features
### Example 3: Add more EM iterations to an already fitted model
###
### Example scenario: After running Example 2 with 10 EM iterations,
### you feel that the model has not yet converged and want to add
### 5 more EM iterations to the "uvw2" model specified in the
### setting.
###
### To run Example 3, you must first run example 2.
###
### Note: In the following, Step 1 and Step 2 are exactly the same
### as those in Example 2.
###
library(Matrix);
dyn.load("lib/c_funcs.so");
source("src/R/c_funcs.R");
source("src/R/util.R");
source("src/R/model/util.R");
source("src/R/model/multicontext_model_utils.R");
set.seed(0);

# (1) Read input data
input.dir = "test-data/multicontext_model/simulated-mtx-uvw-10K"
# (1.1) Training observations and observation features
obs.train = read.table(paste(input.dir,"/obs-train.txt",sep=""),
sep="\t", header=FALSE, as.is=TRUE);
names(obs.train) = c("src_id", "dst_id", "src_context",
"dst_context", "ctx_id", "y");
x_obs.train = read.table(paste(input.dir,"/sparse-feature-obs-train.txt",
sep=""), sep="\t", header=FALSE, as.is=TRUE);
names(x_obs.train) = c("obs_id", "index", "value");
# (1.2) Test observations and observation features
obs.test = read.table(paste(input.dir,"/obs-test.txt",sep=""),
sep="\t", header=FALSE, as.is=TRUE);
names(obs.test) = c("src_id", "dst_id", "src_context",
"dst_context", "ctx_id", "y");
x_obs.test = read.table(paste(input.dir,"/sparse-feature-obs-test.txt",
sep=""), sep="\t", header=FALSE, as.is=TRUE);
names(x_obs.test) = c("obs_id", "index", "value");
# (1.3) User/item/context features
x_src = read.table(paste(input.dir,"/sparse-feature-user.txt",sep=""),
sep="\t", header=FALSE, as.is=TRUE);
names(x_src) = c("src_id", "index", "value");
x_dst = read.table(paste(input.dir,"/sparse-feature-item.txt",sep=""),
sep="\t", header=FALSE, as.is=TRUE);
names(x_dst) = c("dst_id", "index", "value");
x_ctx = read.table(paste(input.dir,"/sparse-feature-ctxt.txt",sep=""),
sep="\t", header=FALSE, as.is=TRUE);
names(x_ctx) = c("ctx_id", "index", "value");

# (2) Index data: Put the input data into the right form
# Convert IDs into numeric indices and
# Convert some data frames into matrices
data.train = indexData(
obs=obs.train, src.dst.same=FALSE, rm.self.link=FALSE,
x_obs=x_obs.train, x_src=x_src, x_dst=x_dst, x_ctx=x_ctx,
add.intercept=TRUE
);
data.test = indexTestData(
data.train=data.train, obs=obs.test,
x_obs=x_obs.test, x_src=x_src, x_dst=x_dst, x_ctx=x_ctx
);

# (3) Load the "uvw2" model from Example 2.
# If the follwoing file does not exist, run Example 2.
load("/tmp/tutorial-BST/example-2_uvw2/model.last");
model = list(factor=factor, param=param);

# (4) Run 5 additional EM iterations
dyn.load("lib/c_funcs.so");
source("src/R/c_funcs.R");
source("src/R/util.R");
source("src/R/model/util.R");
source("src/R/model/multicontext_model_genData.R");
source("src/R/model/multicontext_model_utils.R");
source("src/R/model/multicontext_model_MStep.R");
source("src/R/model/multicontext_model_EM.R");
source("src/R/model/GLMNet.R");
out.dir = "/tmp/tutorial-BST/example-3_uvw2";
set.seed(2);
ans = fit.multicontext(
obs=data.train$obs, # Observation table
feature=data.train$feature, # Features
init.model=model, # Initial model = list(factor, param)
nSamples=200, # Number of samples drawn in each E-step: could be a vector of size nIter.
nBurnIn=20, # Number of burn-in draws before take samples for the E-step: could be a vector of size nIter.
nIter=5, # Number of EM iterations
test.obs=data.test$obs, # Test data: Observations for testing
test.feature=data.test$feature, # Features for testing
IDs=data.test$IDs,
is.logistic=FALSE,
out.level=1, # out.level=1: Save the factor & parameter values to out.dir/model.last and out.dir/model.minTestLoss
out.dir=out.dir, # out.level=2: Save the factor & parameter values of each iteration i to out.dir/model.i
out.overwrite=TRUE,
verbose=1, # Set to 0 to disable console output; Set to 100 to print everything to the console
verbose.M=2,
ridge.lambda=1 # Add diag(lambda) to X'X in linear regression
);

# Check the output
read.table(paste(out.dir,"/summary",sep=""), header=TRUE, sep="\t", as.is=TRUE);

# Load the model
load(paste(out.dir,"/model.last",sep=""));
# It loads param, factor, IDs, prediction
str(param, max.level=2);
str(factor);

###
### Example 4: Fit the RLFM model with sparse features
### glmnet is used to fit prior regression parameters
###
library(Matrix);
Expand Down Expand Up @@ -312,7 +420,7 @@ source("src/R/model/multicontext_model_MStep.R");
source("src/R/model/multicontext_model_EM.R");
source("src/R/model/GLMNet.R");
set.seed(2);
out.dir = "/tmp/unit-test/simulated-mtx-uvw-10K";
out.dir = "/tmp/tutorial-BST/example-4";
ans = run.multicontext(
obs=data.train$obs, # Observation table
feature=data.train$feature, # Features
Expand Down

0 comments on commit 378d709

Please sign in to comment.