-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vtreat and sklearn pipeline #12
Comments
Thanks, I haven't finished the Pipeline integration, but you have given me some good pointers on steps to get there. I'll close this after I add some of the suggestions. |
In version 0.3.4 of vtreat the transform implement a lot more of the sklearn step interface. Also we have a new really neat feature that warms if Thanks for helping with the package. |
Thanks!
|
Also, |
I am going to stub-out the get/set parameters until I have some specific use-case/applications to code them to (are they tuned over during cross-validation, are they used to build new pipelines, are they just for display, are they used to simulate pickling?). I've added some more pretty-printing, but a lot of these objects are too complicated to be re-built from their printable form. |
Thanks for explanations! |
First, thank you very much for spending so much time to give useful and productive advice. I've tried to incorporate a lot of it into Back to your points. Yes I've decided not to cache the result for user in a later Overall in the cross-validated mode not only do the I did add warnings based on caching the id of the data used in I've spent some more time researching Regarding parameters, I still am not exposing them. You correctly identified the most interesting one: If you strongly disagree, or have new ideas, or I have missed something, please do re-open this issue or file another one. If anything is unclear open an issue and I will be happy to build up more documentation. |
I've got it: configure which parameters are exposed to the pipeline controls during construction. I am going to work on that a bit. |
I've worked out an example of The issues include:
|
First of all really interesting project, that could save a lot of repetitive work and provide good baseline.
I've tried to find example in docs that uses
Pipeline
fromscikit-learn
but I didn't, so this is my quick and dirty attempt based on yours:In general, it seems to work, but :
__repr__
,get_params
etc. to have nice representation inPipeline
get_feature_names
method to haveclf['preprocessor'].get_feature_names()
cols_to_copy
and dropy
manually, to avoid leakingy
vtreat.cross_plan...
could be replaced by validation schemes fromscikit-learn
likeGridSearchCV
The text was updated successfully, but these errors were encountered: