Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce the time when calling sampler #29

Merged
merged 1 commit into from May 31, 2021
Merged

reduce the time when calling sampler #29

merged 1 commit into from May 31, 2021

Conversation

hwany-j
Copy link
Contributor

@hwany-j hwany-j commented May 31, 2021

In the current code, it has lots of costs for calculating weights and statistics of the number of each label when calling init().
It might be a problem when incoming a large dataset, so I changed this part using pandas.
I checked it reduces time within 1 second when I use over 5 million data. (In previous code, it takes 20 minutes to use a sampler)

@hwany-j
Copy link
Contributor Author

hwany-j commented May 31, 2021

It might be solved at #21

@ufoym ufoym merged commit ad50e22 into ufoym:master May 31, 2021
@pherber3
Copy link

pherber3 commented Jun 9, 2021

With this change I now get TypeError: callback_get_label() missing 1 required positional argument: 'idx' when using a custom callback_get_label function. Is there some adjustment to how we should be calling the sampler with custom datasets? The source code still says callback_get_label takes two arguments: dataset and index but when callback_get_label is called it only uses dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants