New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback API #427

Closed
hadley opened this Issue Jun 8, 2016 · 2 comments

Comments

Projects
None yet
3 participants
@hadley
Member

hadley commented Jun 8, 2016

Mostly a placeholder for now, but for the next release of readr we need to make sure we have a callback interface so that have a way to handle files that are bigger than memory.

Rough sketch of interface after discussions with @jcheng5

read_csv_chunked <- function(..., callback) {
  callback <- as_chunk_callback(callback)

  callback$begin()
  on.exit(callback$finally(), add = TRUE)
  pos <- 1

  while (callback$continue() && has_more_rows()) {
    data <- read_more_rows()
    callback$receive(data, pos)
    pos <- pos + nrow(data)
  }

  return(callback$result())
}

as_chunk_callback <- function(x) UseMethod("as_chunk_callback")
as_chunk_callback.function <- function(x) {
  SideEffectChunkCallback$new(callback)
}
as_chunk_callback.R6ClassGenerator <- function(x) {
  as_chunk_callback(x$new())
}
as_chunk_callback.ChunkCallback <- function(x) {
  x
}

# This would be used if the result should be thrown away
SideEffectChunkCallback <- R6Class("SideEffectChunkCallback", "ChunkCallback",
  private = list(
    callbackFunc = NULL,
    cancel = FALSE
  ),
  public = list(
    initialize = function(callbackFunc) {
      check_callback_fun(callbackFunc)
      private$callbackFunc <- callbackFunc
    },
    receive = function(data, index) {
      result <- private$callbackFunc(data, index)
      private$cancel <- identical(result, FALSE)
    },
    continue = function() {
      !private$cancel
    }
  )
)

# Used if the result of each chunk should be combined
# at the end
DataFrameCallback <- R6::R6Class("DataFrameCallback", "ChunkCallback",
  private = list(
    callbackFunc = NULL,
    results = list()
  ),
  public = list(
    initialize = function(callbackFunc) {
      private$callbackFunc <- callbackFunc
    },
    receive = function(data, index) {
      result <- private$callbackFunc(data, index)
      private$results <- c(private$results, list(result))
    },
    finally = function() {
      dplyr::bind_rows(private$results)
    }
  )
)
check_callback_fun <- function(x) {
  n_args <- length(formals(x))
  if (n_args < 2) {
    stop("`callbackFunc` must have two or more arguments", call. = FALSE)
  }
}

@hadley hadley modified the milestone: 0.3.0 Jul 13, 2016

@hadley hadley added the feature label Jul 13, 2016

jimhester added a commit to jimhester/readr that referenced this issue Jul 22, 2016

Implementation of chunked reading
For both lines and delimiters

Fixes tidyverse#427
@vspinu

This comment has been minimized.

Member

vspinu commented Dec 18, 2016

Wasn't this one "fixed" by now?

@jimhester

This comment has been minimized.

Member

jimhester commented Jan 23, 2017

I think now that #498 is merged this is done and can be closed.

@jimhester jimhester closed this Jan 23, 2017

@lock lock bot locked and limited conversation to collaborators Sep 24, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.