Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature rejection does not take into account negative correlations #284

Closed
dmfolgado opened this issue Nov 19, 2019 · 0 comments
Closed
Labels
bug 🐛 Something isn't working

Comments

@dmfolgado
Copy link

dmfolgado commented Nov 19, 2019

Describe the bug

The profilers seem not taking into account negative correlations to remove redundant features. Despite pandas-profiling does a good job detecting positive correlations on the dataset, negative correlations are not being detected. I do believe the reason lies the condition to remove a given variable assumes that the correlation is above a given positive threshold value.

To Reproduce

import numpy as np
import pandas as pd
import pandas_profiling

# Let's generate a dataset where the first and second variables are negative correlated and the third variable is random.
X=np.array([np.arange(100), np.arange(100)[::-1], np.random.randn(100)]).T
df = pd.DataFrame(X, columns=["1", "2", "3"])

profile=df.profile_report()
profile.get_rejected_variables()    # []
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants