New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about usage... #41
Comments
Label which you predicted for |
Great, So, allow me to explain a bit further so we can be on the same page... In your example,: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0), y represent the label of X, which is, in my case it would be the label of train or SalePrice. So technically, you seem to be using only one dataset... i.e., in my case, I can pop SalePrice out of train and make it y, and use the remaining data as X, which is what I did when I tested, so, I am essentially using a piece of train when I get to S_test, as it is reflected in your example here: So, technically, by using S_test, I have not reached the point to introduce my unknown dataset, such as test that I need to predict a label for... |
In my example the following lines are used just to create artificial data for demonstrational purposes: X, y = boston.data, boston.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) When we call S_train, S_test = stacking(models, X_train, y_train, X_test, ...) So in your task do not use |
In my attempt to follow the recommendations. I tried this approach, which provided a so-so result and also scored on Kaggle: Approach # 1: ( Score on Kaggle: 0.12146
How would you recommend improving the model or the process to boost the accuracy and generate a better score on kaggle? |
Your code and score look OK. The best place to look for model improvement is Kaggle Notebooks related to a specific competition. Basically it’s a standard practice on Kaggle to study high scoring notebooks and try to incorporate code from them in your model. If you are particularly interested in stacking then again you can search Notebooks by keyword There are some general recommendations on how to improve stacking model but experiment is the only answer: 1). It’s very important to remember that stacking is about quality of models not quantity. And in this case quality means not only good score but also low correlation of predictions. Stack of 3 good uncorrelated models can beat stack of 30 highly correlated models. 2). Sometimes use of stacking is excess. At first always try simple averaging of predictions of your individual models. In some cases this approach may outperform stacking. |
Thank you for your excellent recommendations and support. i will definitely spend some time in those areas... |
Good luck in Kaggle competitions! |
I am trying to predict Housing prices, where I have a train data set and a test data set. the train data has a label and I need to train on it to later use this trained model to predict the label for the test data, which do not have a label. Aso, I followed your process on my train data set and performed the stacking, and applied the second level to the S_train and S_test variables as indicated in your instructions.
Now that i have done that, how do I proceed to predict the label on the test (unknown) dataset?
The text was updated successfully, but these errors were encountered: