Perform transform on test set #17

shaygeller · 2018-12-31T13:55:37Z

Hi,
This project looks really cool and important.
A good methodology of any data transformation and especially feature extraction is to do it on the train set and then transform the test set accordingly. That's because the test set should pretend to be the "real world" data and should remain unknown for decisions regarding any transformation.
At the moment, you don't have such an option in your code, but it would be really good to have one.

I know that some one-hot encoding can be a problem because the train and test can have different values. You can just create an "other" column for each categorical column and randomly assign some rows into it (from the train). That way, the classifier that gets the most relevant features will not skip this "other" column due to not having enough information in it.

WillKoehrsen · 2019-01-06T23:51:51Z

You should be able to transform the testing set after doing the feature selection on the training data. Assuming train is the training data after selection, you could do this using pandas align

test = pd.get_dummies(test)
train, test = train.align(test, axis=1, join='inner')

This will make sure both dataframes have the same exact columns. axis=1 refers to the columns, and join='inner' keeps only columns in both dataframes.

wbgreen0405 · 2020-02-18T18:41:17Z

@WillKoehrsen Thank you this works.

How do I add the target variable back into the dataframe?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform transform on test set #17

Perform transform on test set #17

shaygeller commented Dec 31, 2018 •

edited

Loading

WillKoehrsen commented Jan 6, 2019 •

edited

Loading

wbgreen0405 commented Feb 18, 2020

Perform transform on test set #17

Perform transform on test set #17

Comments

shaygeller commented Dec 31, 2018 • edited Loading

WillKoehrsen commented Jan 6, 2019 • edited Loading

wbgreen0405 commented Feb 18, 2020

shaygeller commented Dec 31, 2018 •

edited

Loading

WillKoehrsen commented Jan 6, 2019 •

edited

Loading