Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Feedback on tutorial 3 - titanic embarked embedding #90

Closed
mikowals opened this issue Feb 4, 2016 · 1 comment
Closed

Feedback on tutorial 3 - titanic embarked embedding #90

mikowals opened this issue Feb 4, 2016 · 1 comment

Comments

@mikowals
Copy link

mikowals commented Feb 4, 2016

Thanks for the tutorial. It was very easy to follow. Like many might do, I tested my new knowledge by playing with the other categorical variables but this led to a difficult to understand stack trace.

To reproduce, just replace "Embarked" with "Pclass" in the lines where X and `embarked_classes" get assigned. When trying to fit the model this will throw index out of range errors.

I eventually fixed it by increasing n_classes to unique classes plus 1. I think the error stems from the embedding wanting a row for nan (the Embarked column has missing values but Pclass does not) but I could not get my head around the code to confirm the actual source.

I am only mentioning this as it is a beginners tutorial and the most basic fix is a comment in the code to explain that n_classes needs to account for nan (assuming I am understanding the error correctly). I also wonder if n_classes should just get handled inside categorical_variable().

@ilblackdragon
Copy link
Contributor

You are right, nan is considered as another class.
The correct way to do it would be actual to use cat_processor to get the full count - len(cat_processor.vocabularies_[0]) in the vocabulary.

I'll correct the post, thanks for commenting here about this!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants