-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for Rborist support. #418
Comments
Is there any way to determine the total list of predictors used in any split among the trees? Also, I'm not sure that |
The 'predInfo' vector reports the Gini gain across all trees, so those I may be missing your point, but PreTrain is invoked independently of
On 04/27/2016 02:30 PM, topepo wrote:
|
I've checked in a draft of the method. Please take a look. Right now, For example, if I create a boosted tree with B = 100 iterations, I can usually get predictions from that same model object for sub-models where B <= 100 (all other parameters being equal). It looks like Rborist would need to re-trian the model to do that. With the current framework in |
Correct me if I am wrong, but Rborist now accepts class weights. I think that commit 4f6a245 doesn't take this into account. Thank you both for your great packages! |
It doesn't but wouldn't need to. Since |
Great, thank you for the clarification. I thought it was necessary as in ranger #414. |
Thank you for checking in the draft. I have not looked at it yet, but Yes, I think I see the problem: prediction requires the same predictor On a separate topic, the PreTrain object is just meant as a convenience,
On 04/28/2016 07:55 AM, topepo wrote:
|
I'm not sure that I understand this. I'm saying that, when there is a trained model object where we can get predictions from different values of the tuning parameter (e.g.
No, I don't think that it generally has anything to do with that. The bottom-line is that I don't think that |
On 04/28/2016 12:19 PM, topepo wrote:
Agreed. You were speaking about training, but I had misconstrued your
Agreed again: further down the wrong path.
Agree with the second point, but not the first: "PreTrain" is proving to be an unfortunate choice of terms. It should
|
Oh, dear. The points to which I was responding do not show up in the most recent reply, despite their being present in the "Sent" image. Let's try again, with some editing:
I'm not sure that I understand this. I'm saying that, when there is a trained model object where we can get predictions from different values of the tuning parameter (e.g. predProb), we can get some time savings using the "sub-model trick". Agreed. You were speaking about training, but I had misconstrued your remarks to be about an obscure feature of separate testing. In a nutshell, I had read "submodel" but had mapped it to "subdesign" by mistake.
No, I don't think that it generally has anything to do with that. Agreed again: further down the wrong path. The bottom-line is that I don't think that train can exploit the PreTrain option since the model has not been trained yet. Once it is trained, the value of predProb is inherent in the model object. Agree with the second point, but not the first: "PreTrain" is proving to be an unfortunate choice of terms. It should probably have been "PreFormat" or "PreSort". A "pretrained" object is simply a reworking of the data.frame into a format friendlier to the Arborist's internal representation. Multiple models can be trained from the same "PreTrain" object, without each instance having to generate the same internal representation each time. In fact, invocations of the "PreTrain" command accept only a single argument, the observation set. All that Caret need do, assuming the user has independently generated a PreTrain object, is silently pass the object down to the Arborist as an uninteresting optional parameter. In fact, one could even train subdesigns using the "submodel trick" by setting selected predictor weights to zero. |
That reminds me... what happens with |
Should default to 'predFixed' using derived quantity. |
Is |
No: 'predFixed' is the default for low predictor counts, using a value similar |
A final version and test documents are now checked in. |
Please add appropriate hooks and documentation for invoking the Rborist package, an implementation of the Random Forest algorithm.
Please note that when iteratively training over a fixed data set, the command PreTrain can benefit performance by caching, rather than recomputing, internal state.
The text was updated successfully, but these errors were encountered: