Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizer and NaN values. #7

Closed
w32zhong opened this issue Dec 18, 2021 · 1 comment
Closed

Normalizer and NaN values. #7

w32zhong opened this issue Dec 18, 2021 · 1 comment

Comments

@w32zhong
Copy link
Contributor

For StandardScaler, looks like it supports NaN values, see class Normalizer:

null_index = np.isnan(X)

However, during preprocess, _fill_na() will fill na_value for non-string.
So

  • for dtype=str, the X values will be string
  • for dtype=float/int, the X values will be na_value

In the first case, np.isnan will throw an error because X elements are of string type.
In the second case, there is no point to normalize numbers if we have a na_value there.

Is this behavior expected or not?

@zhujiem
Copy link
Contributor

zhujiem commented Dec 24, 2021

The bug has been fixed by removing null_index = np.isnan(X)
nan values are not allowed for numeric data type. users must set na_value explicitly.

@zhujiem zhujiem closed this as completed Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants