Explicit NULL cells disappear when writing #203

py9mrg · 2020-12-10T13:15:23Z

Recently I have been trying to write some results to existing Google Sheet files and have found two quirks. I think probably by design but I thought I'd raise them as potential features / options to existing functions that can make life a bit easier for the user.

First: writing NULL columns as empty columns, not skipping them.

Let's say I have a tibble of list-columns something like:

tmp <- tibble(A = list(NULL, NULL, NULL), B = list(1, 2, 3))
tmp
# A tibble: 3 x 2
  A      B        
  <list> <list>   
1 <NULL> <dbl [1]>
2 <NULL> <dbl [1]>
3 <NULL> <dbl [1]>

When I write this to a sheet using (as far as I can tell) any of the writing functions, the NULL column just gets skipped rather than a blank column being written. This can mean columns get written in the wrong place - e.g. in the above example column B would be written in column A of the Google Sheet because column A would be ignored rather than a blank column being written. Yet a 3 x 2 sheet will result - just with the B column in the A position and a blank column at the end. If that makes sense?! So it's like write_sheet knows about the NULL column and makes space for it, but then doesn't actually write it.

Often I am reading in data with some empty columns (which get read in as a NULL column), doing some wrangling, and then writing to a new and/or existing sheet. It would be nice if the "round-trip" was symmetrical in the sense that empty columns get read in as NULL and NULL columns get written as empty columns. Currently I have to force coercion to character when reading with col_types = 'c' and then use mutate(across(everything(), as.list)) to coerce everything back into list columns - then the NULL columns become character columns of NA, which get written as blank values as intended.

The text was updated successfully, but these errors were encountered:

jennybc · 2020-12-12T19:47:44Z

Will you separate the second point into a second issue? That makes things easier to discuss and fits better with our workflows.

The "disappearing" cells are happening when we convert to JSON and presumably we could account for explicit NULLs. But list-columns aren't really what googlesheets4 expects to put into and read out of spreadsheet cells.

A more typical way to represent an empty column in R is with NAs:

library(tidyverse)
library(googlesheets4)
library(googledrive)

(tmp <- tibble(A = list(NULL, NULL, NULL), B = list(1, 2, 3)))
#> # A tibble: 3 x 2
#>   A      B        
#>   <list> <list>   
#> 1 <NULL> <dbl [1]>
#> 2 <NULL> <dbl [1]>
#> 3 <NULL> <dbl [1]>

ss <- write_sheet(tmp)
#> Creating new Sheet: "perfect-robberfly"
read_sheet(ss)
#> Reading from "perfect-robberfly"
#> Range "tmp"
#> # A tibble: 3 x 2
#>       A B    
#>   <dbl> <lgl>
#> 1     1 NA   
#> 2     2 NA   
#> 3     3 NA

(foo <- tibble(A = rep.int(NA, 3), B = list(1, 2, 3)))
#> # A tibble: 3 x 2
#>   A     B        
#>   <lgl> <list>   
#> 1 NA    <dbl [1]>
#> 2 NA    <dbl [1]>
#> 3 NA    <dbl [1]>

ss2 <- write_sheet(foo)
#> Creating new Sheet: "sure-creature"
read_sheet(ss2)
#> Reading from "sure-creature"
#> Range "foo"
#> # A tibble: 3 x 2
#>   A         B
#>   <lgl> <dbl>
#> 1 NA        1
#> 2 NA        2
#> 3 NA        3

drive_rm(ss, ss2)
#> Files deleted:
#>   * perfect-robberfly: 1XbWkRgStAQpXya0eteAQOchJ-YkpNJsbu93fn_PUDaY
#>   * sure-creature: 1E57-iRrOZLdoPYPHX7MTPXEy432q82WyzFHNz_1stms

^{Created on 2020-12-12 by the reprex package (v0.3.0.9001)}

jennybc · 2020-12-12T19:51:56Z

This is a more compelling example in terms of considering this a bug:

library(tidyverse)
library(googlesheets4)
library(googledrive)
# hidden auth chunk here

(tmp <- tibble(A = list(NULL, "HI", NULL), B = list(1, 2, 3)))
#> # A tibble: 3 x 2
#>   A         B        
#>   <list>    <list>   
#> 1 <NULL>    <dbl [1]>
#> 2 <chr [1]> <dbl [1]>
#> 3 <NULL>    <dbl [1]>

ss <- write_sheet(tmp)
#> Creating new Sheet: "aware-lion"
read_sheet(ss)
#> Reading from "aware-lion"
#> Range "tmp"
#> # A tibble: 3 x 2
#>   A             B
#>   <list>    <dbl>
#> 1 <dbl [1]>    NA
#> 2 <chr [1]>     2
#> 3 <dbl [1]>    NA

drive_rm(ss)
#> Files deleted:
#>   * aware-lion: 1zhjqwcF5T68Tze4WcUyAorHgPIXQBY47TUgT0IekZqk

^{Created on 2020-12-12 by the reprex package (v0.3.0.9001)}

py9mrg · 2020-12-14T14:44:13Z

Will you separate the second point into a second issue? That makes things easier to discuss and fits better with our workflows.

Yes, no problem - I should have thought of that myself, sorry.

A more typical way to represent an empty column in R is with NAs:

Absolutely, and when I write sheets that I create myself in R first then I never use NULL but always NA. The slight "quirk" for me comes when reading in sheets from Google that other people have made, which contain empty cells. read_sheet coerces empty cells to NULL, not NA, so I either have to (a) manually correct all NULLs to NA after reading, or (b) coerce to character when reading in using col_types = 'c' and then (if I want to make use of the advice you kindly gave me in #169) I have to coerce back to list columns before using write_sheet.

Neither are a major problem, it's just a little bit more of a fiddle than is ideal and might be nicer if, when reading in empty cells, they are coerced to NA not NULL. I guess really the problem is the reading not the writing! If empty cells can be coerced to NA instead of NULL then I don't have any issues, but I guess it couldn't hurt to make it so that NULLs are coerced to NA when writing as well as a double lock. Depending how much hassle it is to implement, of course.

jennybc · 2020-12-28T17:20:02Z

library(googledrive)
library(googlesheets4)
library(tidyverse)

# hidden auth chunk right here

(dat <- tibble(
  x = 1:3,
  chr = c("hi", NA, "bye"),
  mixed = list(NA, 1.5, "foo"),
  z = letters[1:3]
))
#> # A tibble: 3 x 4
#>       x chr   mixed     z    
#>   <int> <chr> <list>    <chr>
#> 1     1 hi    <lgl [1]> a    
#> 2     2 <NA>  <dbl [1]> b    
#> 3     3 bye   <chr [1]> c

ss <- write_sheet(dat)
#> Creating new Sheet: "semiironic-monkfish"

read_sheet(ss)
#> Reading from "semiironic-monkfish"
#> Range "dat"
#> # A tibble: 3 x 4
#>       x chr   mixed     z    
#>   <dbl> <chr> <list>    <chr>
#> 1     1 hi    <NULL>    a    
#> 2     2 <NA>  <dbl [1]> b    
#> 3     3 bye   <chr [1]> c

drive_rm(ss)
#> Files deleted:
#>   * semiironic-monkfish: 1dQ--QBgYrdtCmW5qc70jVO-JGzE8BU3zPAQOMUdxXTc

I think this must illustrate what you're talking about.

An empty cell comes in as NA versus NULL depending on context. In a column that is guessed as a regular atomic type, such as character, blank cells are NA. But if the column appears to be of mixed type and guessing leads to a list-column, blank cells that start as a length 1 logical NA are read as NULL. I don't love that asymmetry, although I'm not sure it's avoidable.

(But yes you may need to specify column type for some of these Sheets controlled by others, in general.)

py9mrg · 2021-01-05T10:21:13Z

Yes, that's exactly right - thanks for digging a bit more to clarify the situations when it does / doesn't happen. That helps a lot to understand. I'll probably just carry on using col_types = 'c' when reading in that case. I mean, it's often required anyway for other reasons due to all the, let's say - inconsistent - googlesheet things people can do when they're not worried about using anything other than GS! If it's not avoidable maybe a note in the documentation is enough to close the issue. Oh an happy new year.

jennybc · 2021-01-05T16:52:15Z

googlesheets4 will gain better column type specification machinery at some point, in which case there would presumably be a way to to specify a default column type (perhaps character, for you?) and/or a way to say that you never want to guess "list-column".

Closes #203

JerePlum99 · 2023-05-30T15:59:59Z

@jennybc I came across this older issue while trying to understand the behavior of NULL values input passed into range_write.

I'm still trying to fully understand the difference between batchUpdate.updateCellsRequest vs spreadsheets.values.batchUpdate , but as range_write uses batchUpdate.updateCellsRequest is there no way to emulate the skip behavior documented for spreadsheets.values.batchUpdate. Based on that documentation, I would expect there to be a way to skip updating a cell by passing in a NULL value, rather than overwriting it as an empty cell.

See the below references, as writing a selective range and "writing a selective range" / leaving an existing column in the middle of a range exactly the same is the behavior I'm trying to achieve:

Happy to try creating a reprex if useful.

jennybc · 2023-05-30T21:19:20Z

Yes, it would be best to open a new issue with a reprex.

py9mrg changed the title ~~Potential feature request(s) - writing NULL columns & sheet_append without affecting format~~ Potential feature request(s) - writing NULL columns & sheet_append without changing formatting Dec 10, 2020

jennybc changed the title ~~Potential feature request(s) - writing NULL columns & sheet_append without changing formatting~~ Explicit NULL cells disappear when writing Dec 12, 2020

jennybc added the bug 🐛 an unexpected problem or unintended behavior label Dec 12, 2020

jennybc closed this as completed in 4a8c8be Jul 15, 2021

jennybc added a commit that referenced this issue Jul 15, 2021

NULL needs one more layer of wrapping

ef06901

Closes #203

JerePlum99 mentioned this issue Aug 13, 2023

Bug: Use of Null values does not accurately "skip" cells within range_write() #295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit NULL cells disappear when writing #203

Explicit NULL cells disappear when writing #203

py9mrg commented Dec 10, 2020 •

edited

Loading

jennybc commented Dec 12, 2020 •

edited

Loading

jennybc commented Dec 12, 2020

py9mrg commented Dec 14, 2020

jennybc commented Dec 28, 2020 •

edited

Loading

py9mrg commented Jan 5, 2021 •

edited

Loading

jennybc commented Jan 5, 2021

JerePlum99 commented May 30, 2023

jennybc commented May 30, 2023

Explicit NULL cells disappear when writing #203

Explicit NULL cells disappear when writing #203

Comments

py9mrg commented Dec 10, 2020 • edited Loading

jennybc commented Dec 12, 2020 • edited Loading

jennybc commented Dec 12, 2020

py9mrg commented Dec 14, 2020

jennybc commented Dec 28, 2020 • edited Loading

py9mrg commented Jan 5, 2021 • edited Loading

jennybc commented Jan 5, 2021

JerePlum99 commented May 30, 2023

jennybc commented May 30, 2023

py9mrg commented Dec 10, 2020 •

edited

Loading

jennybc commented Dec 12, 2020 •

edited

Loading

jennybc commented Dec 28, 2020 •

edited

Loading

py9mrg commented Jan 5, 2021 •

edited

Loading