Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add case weights to summary function #1

Closed
topepo opened this issue May 16, 2014 · 3 comments
Closed

add case weights to summary function #1

topepo opened this issue May 16, 2014 · 3 comments
Assignees
Labels

Comments

@topepo
Copy link
Owner

topepo commented May 16, 2014

The original request is from here. This is the request:

I'm using R's caret package to do some grid search and model evaluation. I have a custom evaluation metric that is a weighted average of absolute error. Weights are assigned at the observation level.

X <- c(1,1,2,0,1) #feature 1
w <- c(1,2,2,1,1) #weights
Y <- 1:5 #target, continuous

#assume I run a model using X as features and Y as target and get a vector of predictions

mymetric <- function(predictions, target, weights){

v <- sum(abs(target-predictions)*weights)/sum(weights) 
return(v)
}

Here an example is given on how to use summaryFunction to define a custom evaluation metric for caret's train().
To quote:

The trainControl function has a argument called summaryFunction that specifies a function for computing performance. The function should have these arguments:

data is a reference for a data frame or matrix with columns called obs
and pred for the observed and predicted outcome values (either numeric
data for regression or character values for classification).
Currently, class probabilities are not passed to the function. The
values in data are the held-out predictions (and their associated
reference values) for a single combination of tuning parameters. If
the classProbs argument of the trainControl object is set to TRUE,
additional columns in data will be present that contains the class
probabilities. The names of these columns are the same as the class
levels. lev is a character string that has the outcome factor levels
taken from the training data. For regression, a value of NULL is
passed into the function. model is a character string for the model
being used (i.e. the value passed to the method argument of train).

I cannot quite figure out how to pass the observation weights to summaryFunction.

@topepo topepo added the bug label May 16, 2014
@topepo topepo self-assigned this May 16, 2014
@topepo
Copy link
Owner Author

topepo commented Jun 4, 2014

Resolved (and test cases added) as of caret version 6.0-29

@topepo topepo closed this as completed Jun 4, 2014
zachmayer added a commit that referenced this issue Oct 16, 2014
topepo pushed a commit that referenced this issue Apr 13, 2015
@myloginid
Copy link

Hi, I am using caret 6.0-58, But still not able to use a custom summary function. My Code and error as below -

mymetric <- function(predictions, target, weights){
v <- sum(abs(target-predictions)*weights)/sum(weights)
return(v) }

number = 10
tmethod = "boot"
tc = trainControl(method = "boot",
number = ifelse(grepl("cv", tmethod), 10, 25),
repeats = ifelse(grepl("cv", tmethod), 1, number),
p = 0.75,
search = "grid",
initialWindow = NULL,
horizon = 1,
fixedWindow = TRUE,
verboseIter = FALSE,
returnData = TRUE,
returnResamp = "final",
savePredictions = FALSE,
classProbs = FALSE,
# summaryFunction = mymetric,
selectionFunction = "best",
preProcOptions = list(thresh = 0.95, ICAcomp = 3, k = 5),
sampling = NULL,
index = NULL,
indexOut = NULL,
timingSamps = 0,
predictionBounds = rep(FALSE, 2),
seeds = NA,
adaptive = list(min = 5, alpha = 0.05,method = "gls", complete = TRUE),
trim = FALSE,
allowParallel = TRUE)

tglm = train( x = wip[,c(basef,retf_p1)] , y = wip$Ret_2, method = "glm", weights = wip$Weight_Intraday, trControl = tc)
Hide Traceback

Error in FUN(left, right) : non-numeric argument to binary operator
7 eval(expr, envir, enclos)
6 eval(f)
5 Ops.data.frame(abs(target - predictions), weights)
4 ctrl$summaryFunction(testOutput, lev, method)
3 evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels,
metric = metric, method = method)
2 train.default(x = wip[, c(basef, retf_p1)], y = wip$Ret_2, method = "glm",
weights = wip$Weight_Intraday, trControl = tc)
1 train(x = wip[, c(basef, retf_p1)], y = wip$Ret_2, method = "glm",
weights = wip$Weight_Intraday, trControl = tc)

I have supplied the weights while specifying the train function. Pls let me know if there is mistake in the call.

I also tried the same with changing the column names of the function to match to the data column names as this -
mymetric <- function(predictions, Ret_2, Weight_Intraday){
v <- sum(abs(Ret_2-predictions)*Weight_Intraday)/sum(Weight_Intraday)
return(v)
}

But it still failed.

Thanks,
Manish

@topepo
Copy link
Owner Author

topepo commented Dec 3, 2015

The (lack of) details are here.

Basically, when the summary function us called, there is a data frame called data available within the R function that you supply. Normally, it has columns for the holdout data called obs, pred, and rwoIndex. If you use weights in the function call to train, then there is an additional column called weights. For example, the data object might look like this:

          pred        obs   weights rowIndex
123  virginica     setosa 0.8394404      107
86  versicolor  virginica 0.7548209      136
35      setosa  virginica 0.4112744       62
17      setosa versicolor 0.4314737      116
121  virginica versicolor 0.3823880       23
137  virginica     setosa 0.3162717        2

Your summary function can use this weight column for its calculations.

Please note that not all R model functions can use case weights so if you want to use a column of your data for case weights, you will have to look at the underlying model function using getModelInfo. I'm working on tagging models that can use weights so a list of them will be available.

topepo pushed a commit that referenced this issue Dec 28, 2016
topepo pushed a commit that referenced this issue Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants