Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor performance on Covertype Data Set #2

Closed
wlattner opened this issue Nov 19, 2014 · 1 comment
Closed

poor performance on Covertype Data Set #2

wlattner opened this issue Nov 19, 2014 · 1 comment

Comments

@wlattner
Copy link
Owner

Classification accuracy on the Covertype Data Set is less than 90% unless max_features is increased over the default sqrt(# features).

@wlattner
Copy link
Owner Author

This dataset has many constant features, such as the the soil type columns. When using only max_features of n_features, we need to make sure we visit at least one non-constant feature, otherwise too many nodes will be marked as leaf nodes. See 80c2e8f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant