Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a rule of thumb for NUM_BOOTSTRAP? #13

Closed
echatzikyriakidis opened this issue Mar 28, 2023 · 2 comments
Closed

Is there a rule of thumb for NUM_BOOTSTRAP? #13

echatzikyriakidis opened this issue Mar 28, 2023 · 2 comments

Comments

@echatzikyriakidis
Copy link

echatzikyriakidis commented Mar 28, 2023

Hi @avsolatorio,

In my experiments I have the default value (500) for the bootstrap rounds when estimating the sensitivity threshold. I see in the implementation that this process is very CPU-bound and utilizes multicore if possible.

In my environment I have 8 CPU cores and usually on large tables it takes 1-2 hours to complete before training starts. All this time the GPU in my runtime environment is idle waiting the sensitivity threshold estimation to complete. (Also, in Colab sometimes it disconnects the runtime because it notices that the runtime uses mainly CPU).

I know that by setting this to a smaller value it will run faster but I wonder if there is a rule of thumb or it is just a matter of try-and-error. I understand that it is important to estimate correctly this threshold as it will be used for early stopping the training.

Thanks!

@avsolatorio
Copy link
Member

avsolatorio commented Apr 28, 2023

Hello @echatzikyriakidis , 100 can be a reasonable trade-off. A higher value of the bootstrap round helps in producing a stable threshold. So you will have to take note of this.

One potential solution is to allow for precomputation of the sensitivity threshold outside the fit function. When fitting with the data, one can specify a file containing the pre-computed value. It must, however, first check if the parameters used in the pre-computation are consistent with the parameters passed in the fit function.

With this implemented, you can perform the pre-computation on an instance without an accelerator, save it, then change the colab instance having a GPU.

If you're open to contributing to this feature, that would be very welcome! See: #16

@echatzikyriakidis
Copy link
Author

Hi @avsolatorio!

I have managed to overpass the problem with the disconnects in Colab by buying the Colab Pro+ which never disconnects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants