You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Insert join: where the by variables match, simply take all row data from the newer data frame and replace the row data in the older data frame. This is useful for a subset-mutate-update type workflow.
Update join: a regular full join, but when there are common variable names, move all non-NA values from the column in the newer data frame to the older data frame. This is useful to a summarize-mutate-update type workflow.
I've implemented this (likely very badly) in the loopr CRAN package. I'm basically given up on the stacking system there as overcomplication, but I think that the joins are broadly applicable. See code below.
#' Amend variables with new information#' #' Replace all non-NA values in one set of columns with values from another matching set#' @importFrom magrittr %>%#' @export#' #' @param data A data frame#' @param originalNames A vector of column names with out-of-date information#' @param amendNames A vector of column names with amended information. They will be removed at the end of processing.#' @return An amended \code{\link{tbl_df}}amendColumns=function(data, originalNames, amendNames) {
dataNames=dplyr::data_frame(originalNames,
amendNames) %>%
dplyr::mutate(index=1:length(originalNames))
#build calls using ifelsecalls=plyr::dlply(.data=dataNames,
.fun=function(row) lazyeval::lazy(
ifelse(is.na(amendNames),
originalNames,
amendNames)) %>%
lazyeval::interp(amendNames= as.name(row$amendNames),
originalNames= as.name(row$originalNames)),
.variables="index") %>%
setNames(originalNames)
data %>%
dplyr::mutate_(.dots=calls) %>%
dplyr::select_(.dots= sprintf("-`%s`", amendNames))}
#' Amend a dataframe with new information#' #' \code{\link{full_join}} two dataframes. If there are matching columns, #' amend each \code{data} column with the corresponding \code{amendData} column using \code{\link{amendColumns}}.#' #' @importFrom magrittr %>%#' @export#' #' @param data A data frame#' @param amendData A data frame#' @param by A quoted vector of column names to join by. If set to NULL or unspecified, will default to the grouping columns in data#' @param suffix A suffix used internally. No existing column names should use this suffix.#' @return An amended \code{\link{tbl_df}}amend=function(data, amendData, by=NULL, suffix="toFix") {
#default by variables from groupsif (is.null(by)) by=data %>%
dplyr::groups() %>%
lapply(deparse) %>%
unlist %>%
as.vectorif (is.null(by)) stop("Defaulted to merging by data grouping variables. However, no grouping variables found")
#figure out which columns need to be merged.commonNames= intersect(names(data), names(amendData)) %>%
setdiff(by)
if (length(commonNames) !=0) message("Amending columns: ", paste(commonNames, collapse=", "))
#if no columns need to be merge, a simple full joinif (length(commonNames) ==0) dplyr::full_join(data, amendData) else {
#else update columns then jointoFix= paste0(commonNames, suffix=suffix)
if (sum(toFix%in% names(amendData)) >0) stop ("suffix conflict. Please choose another suffix.")
names(toFix) =commonNamesbyLiteral=by %>% sprintf("`%s`", .)
amendData %>%
plyr::rename(toFix) %>%
dplyr::full_join(data, by) %>%
amendColumns(commonNames, unname(toFix)) %>%
dplyr::arrange_(.dots=byLiteral)}}
#' Insert new information into a dataframe.#' #' \code{\link{anti_join}} data with insertData, then \code{\link{bind_cols}} of insertData, then arrange by \code{by} variables.#' @importFrom magrittr %>%#' @export#' #' @param data A data frame#' @param insertData A data frame#' @param by A quoted vector of column names to join by.#' @return An inserted \code{\link{tbl_df}}insert=function(data, insertData, by)
data %>%
dplyr::anti_join(insertData, by=by) %>%
dplyr::bind_rows(insertData) %>%
dplyr::arrange_(.dots=by)
The text was updated successfully, but these errors were encountered:
Insert join: where the by variables match, simply take all row data from the newer data frame and replace the row data in the older data frame. This is useful for a subset-mutate-update type workflow.
Update join: a regular full join, but when there are common variable names, move all non-NA values from the column in the newer data frame to the older data frame. This is useful to a summarize-mutate-update type workflow.
I've implemented this (likely very badly) in the loopr CRAN package. I'm basically given up on the stacking system there as overcomplication, but I think that the joins are broadly applicable. See code below.
The text was updated successfully, but these errors were encountered: