Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowness in pop() due to names() #83

Closed
wlandau opened this issue May 25, 2023 · 3 comments
Closed

Slowness in pop() due to names() #83

wlandau opened this issue May 25, 2023 · 3 comments
Assignees

Comments

@wlandau
Copy link
Owner

wlandau commented May 25, 2023

Related to #81. As of c41e4a0, crew uses environments for controller$queue and controller$list to store mirai tasks. This makes controller$push() and controller$collect() much faster, but controller$pop() is very slow.

library(crew)
controller <- crew_controller_local(workers = 4L)
controller$start()
controller$launch(n = 4L)
index <- 0L
n_tasks <- 6000L
Sys.sleep(5)
for (task in seq_len(n_tasks)) {
  index <- index + 1L
  controller$push(
    name = as.character(index),
    command = index,
    data = list(index = index),
    scale = FALSE,
    seed = 0L
  )
}
Sys.sleep(5)
controller$collect()
px <- proffer::pprof(
  while (length(controller$results) > 0L) {
    controller$pop()
  }
)
controller$terminate()

Screenshot 2023-05-25 at 4 13 11 PM

When names() is called, I am simply trying to extract an element of controller$results without knowing its name in advance. I cannot call controller$results[[1L]] because the object is an environment.

crew/R/crew_controller.R

Lines 447 to 450 in c41e4a0

ids <- names(self$results)
if (length(ids) > 0L) {
id <- ids[[1L]]
task <- self$results[[id]]

Last I checked there were packages like datastructures and hashmap which implement proper data structures, but these and others like it seem to be removed from CRAN. But that is the sort of thing I would like to do.

But maybe there is a way I can sneakily implement controller$results as a linked list / stack. For each task I push to controller$results, I can give it the ID of the task to pop after itself, and I can keep track of the current head of the stack with a single character string.

@wlandau wlandau self-assigned this May 25, 2023
@wlandau
Copy link
Owner Author

wlandau commented May 25, 2023

It's low overhead, but maybe unsatisfying because controller$queue would actually be a stack.

@wlandau
Copy link
Owner Author

wlandau commented May 25, 2023

No, only controller$results is actually a stack, which is fine. Once the tasks finish, collection should be quick anyway. If it's first-in/first-out for tasks known to already be resolved, maybe that's not so bad.

@wlandau
Copy link
Owner Author

wlandau commented Jun 20, 2023

As of 5c9fbe2, this structure is first-in/first-out, like a queue and not a stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant