Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use batched processing instead of processing by instance #80

Closed
dkrako opened this issue Mar 16, 2022 · 1 comment
Closed

use batched processing instead of processing by instance #80

dkrako opened this issue Mar 16, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@dkrako
Copy link
Collaborator

dkrako commented Mar 16, 2022

Currently all of the metrics are more or less structured by the following scheme:

x: array
y: array
a: array

for x_instance, y_instance, a_instance in zip(x, y, a):
    for perturbation_step in range(perturbation_steps):
        x_perturbed = perturb_instance(x_instance, a_instance, perturbation_step)
        y_perturbed = model(x_perturbed)
        score = calculate_score_for_instance(y_instance, y_perturbed)

The choice of perturb_instance arguments are just for simplicity, the code is of course more complex than presented.

But this kind of implementation doesn't use the performance benefits from batched model-prediction and vectorized numpy functions.
Instead we could speed up computations by a magnitude if we would instead use the following approach:

x: array
y: array
a: array
batch_size: int

generator = BatchGenerator(x, y, a, batch_size)
for x_batch, y_batch, a_batch in next(generator):
    for perturbation_step in range(perturbation_steps):
        x_batch_perturbed = perturb_batch(x_batch, a_batch, perturbation_step)
        y_batch_perturbed = model(x_batch_perturbed)
        score = calculate_score_for_batch(y_batch, y_batch_perturbed)

Some of perturb_batch functions may need an inner for-loop again, but others could be computed on the whole batch for sure.
Depending on the dataset size and model complexity, this should lead to significant improvements in performance.

@annahedstroem annahedstroem linked a pull request Apr 7, 2022 that will close this issue
@annahedstroem annahedstroem added the enhancement New feature or request label Apr 16, 2022
@dkrako dkrako mentioned this issue Oct 12, 2022
6 tasks
@annahedstroem
Copy link
Member

Solved in batched processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants