reduce the time when calling sampler #29

hwany-j · 2021-05-31T04:25:58Z

In the current code, it has lots of costs for calculating weights and statistics of the number of each label when calling init().
It might be a problem when incoming a large dataset, so I changed this part using pandas.
I checked it reduces time within 1 second when I use over 5 million data. (In previous code, it takes 20 minutes to use a sampler)

hwany-j · 2021-05-31T04:26:22Z

It might be solved at #21

pherber3 · 2021-06-09T16:30:56Z

With this change I now get TypeError: callback_get_label() missing 1 required positional argument: 'idx' when using a custom callback_get_label function. Is there some adjustment to how we should be calling the sampler with custom datasets? The source code still says callback_get_label takes two arguments: dataset and index but when callback_get_label is called it only uses dataset.

reduce the time when calling sampler

3c85296

ufoym merged commit ad50e22 into ufoym:master May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce the time when calling sampler #29

reduce the time when calling sampler #29

hwany-j commented May 31, 2021

hwany-j commented May 31, 2021

pherber3 commented Jun 9, 2021

reduce the time when calling sampler #29

reduce the time when calling sampler #29

Conversation

hwany-j commented May 31, 2021

hwany-j commented May 31, 2021

pherber3 commented Jun 9, 2021