You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CollinearityThreshold class in our codebase is intended to remove collinear features from datasets. However, it appears to be dropping features that do not meet the specified collinearity threshold, leading to the potential loss of important data. An example of this issue is the unwarranted removal of the 'age' column in the titanic dataset provided in the examples, where the association values are below the set threshold.
Expected Behavior
The class should only remove features that are collinear above the specified threshold. Features with association values below this threshold should be retained in the dataset.
Current Behavior
The class is removing features that do not exceed the collinearity threshold. This behavior is observed in the recursive feature elimination process, where features are being dropped inappropriately.
Steps to Reproduce
Initialize the CollinearityThreshold with a specific threshold.
Fit the selector to a dataset.
Observe that features with association values below the threshold are also being removed.
Suggested Fix
Modify the _recursive_collinear_elimination method to ensure it accurately removes only those features that exceed the specified collinearity threshold. The proposed change includes adding a condition to break the while loop when no more features exceed the threshold, preventing the unnecessary removal of features.
def_recursive_collinear_elimination(association_matrix, threshold):
dum=association_matrix.copy()
most_collinear_features= []
whileTrue:
most_collinear_feature, to_drop=_most_collinear(dum, threshold)
# Break if no more features to dropifnotto_drop:
breakifmost_collinear_featurenotinmost_collinear_features:
most_collinear_features.append(most_collinear_feature)
dum=dum.drop(columns=most_collinear_feature, index=most_collinear_feature)
returnmost_collinear_features
The text was updated successfully, but these errors were encountered:
Description
Problem
The
CollinearityThreshold
class in our codebase is intended to remove collinear features from datasets. However, it appears to be dropping features that do not meet the specified collinearity threshold, leading to the potential loss of important data. An example of this issue is the unwarranted removal of the 'age' column in the titanic dataset provided in the examples, where the association values are below the set threshold.Expected Behavior
The class should only remove features that are collinear above the specified threshold. Features with association values below this threshold should be retained in the dataset.
Current Behavior
The class is removing features that do not exceed the collinearity threshold. This behavior is observed in the recursive feature elimination process, where features are being dropped inappropriately.
Steps to Reproduce
CollinearityThreshold
with a specific threshold.Suggested Fix
Modify the
_recursive_collinear_elimination
method to ensure it accurately removes only those features that exceed the specified collinearity threshold. The proposed change includes adding a condition to break the while loop when no more features exceed the threshold, preventing the unnecessary removal of features.Old Version
New Version
The text was updated successfully, but these errors were encountered: