-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
BUG: Require sample weights to sum to less than 1 when replace = True #61582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6a272c6
6e89593
c265c4b
7e7a73c
dc0aff6
f553ed7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -5815,6 +5815,8 @@ def sample( | |||||
If weights do not sum to 1, they will be normalized to sum to 1. | ||||||
Missing values in the weights column will be treated as zero. | ||||||
Infinite values not allowed. | ||||||
When replace = False will not allow ``(n * max(weights) / sum(weights)) > 1``, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: remove the |
||||||
in order to avoid biased results. See the Notes below for more details. | ||||||
random_state : int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional | ||||||
If int, array-like, or BitGenerator, seed for random number generator. | ||||||
If np.random.RandomState or np.random.Generator, use as given. | ||||||
|
@@ -5851,6 +5853,10 @@ def sample( | |||||
----- | ||||||
If `frac` > 1, `replacement` should be set to `True`. | ||||||
|
||||||
When replace = False will not allow ``(n * max(weights) / sum(weights)) > 1``, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: remove the trailing |
||||||
since that would cause results to be biased. E.g. sampling 2 items without replacement, | ||||||
with weights [100, 1, 1] would yield two last items in 1/2 of cases, instead of 1/102 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also add, "This is similar to specifying |
||||||
|
||||||
Examples | ||||||
-------- | ||||||
>>> df = pd.DataFrame( | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added indentation here looks incorrect.