-
Notifications
You must be signed in to change notification settings - Fork 182
fix: minor basic stats quality fixes #2521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: minor basic stats quality fixes #2521
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 2 files with indirect coverage changes 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires a 'black' fix. otherwise good to go
@@ -157,7 +157,7 @@ def fit(self, data, sample_weight=None, queue=None): | |||
data_table, weights_table = to_table(data, sample_weight, queue=queue) | |||
|
|||
dtype = data_table.dtype | |||
raw_result = raw_result = self._compute_raw( | |||
raw_result = self._compute_raw( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes
@@ -48,7 +48,7 @@ def generate_data(par, size, seed=777): | |||
|
|||
params_spmd = {"ns": 19, "nf": 31} | |||
|
|||
data, weights = generate_data(params_spmd, size) | |||
data, weights = generate_data(params_spmd, rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be made to generate different data for each rank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - that was the original mistake here. size
is the same for every rank, so the same data is generated. rank
is different on every rank, so different data is generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's still being generated in a loop where each rank contains the data from the previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although maybe that would be reflected in the seed
parameter and this should be tweaked further. The data generation function here is pretty wonky. I'll take a closer look tomorrow.
* fix: minor basic stats quality fixes * blacked * vary seed by rank instead of size
Description
A few small corrections
PR completeness and readability
Testing