Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit NULL cells disappear when writing #203

Closed
py9mrg opened this issue Dec 10, 2020 · 8 comments
Closed

Explicit NULL cells disappear when writing #203

py9mrg opened this issue Dec 10, 2020 · 8 comments
Labels
bug 🐛 an unexpected problem or unintended behavior

Comments

@py9mrg
Copy link

py9mrg commented Dec 10, 2020

Recently I have been trying to write some results to existing Google Sheet files and have found two quirks. I think probably by design but I thought I'd raise them as potential features / options to existing functions that can make life a bit easier for the user.

First: writing NULL columns as empty columns, not skipping them.

Let's say I have a tibble of list-columns something like:

tmp <- tibble(A = list(NULL, NULL, NULL), B = list(1, 2, 3))
tmp
# A tibble: 3 x 2
  A      B        
  <list> <list>   
1 <NULL> <dbl [1]>
2 <NULL> <dbl [1]>
3 <NULL> <dbl [1]>

When I write this to a sheet using (as far as I can tell) any of the writing functions, the NULL column just gets skipped rather than a blank column being written. This can mean columns get written in the wrong place - e.g. in the above example column B would be written in column A of the Google Sheet because column A would be ignored rather than a blank column being written. Yet a 3 x 2 sheet will result - just with the B column in the A position and a blank column at the end. If that makes sense?! So it's like write_sheet knows about the NULL column and makes space for it, but then doesn't actually write it.

Often I am reading in data with some empty columns (which get read in as a NULL column), doing some wrangling, and then writing to a new and/or existing sheet. It would be nice if the "round-trip" was symmetrical in the sense that empty columns get read in as NULL and NULL columns get written as empty columns. Currently I have to force coercion to character when reading with col_types = 'c' and then use mutate(across(everything(), as.list)) to coerce everything back into list columns - then the NULL columns become character columns of NA, which get written as blank values as intended.

@py9mrg py9mrg changed the title Potential feature request(s) - writing NULL columns & sheet_append without affecting format Potential feature request(s) - writing NULL columns & sheet_append without changing formatting Dec 10, 2020
@jennybc
Copy link
Member

jennybc commented Dec 12, 2020

Will you separate the second point into a second issue? That makes things easier to discuss and fits better with our workflows.

The "disappearing" cells are happening when we convert to JSON and presumably we could account for explicit NULLs. But list-columns aren't really what googlesheets4 expects to put into and read out of spreadsheet cells.

A more typical way to represent an empty column in R is with NAs:

library(tidyverse)
library(googlesheets4)
library(googledrive)
(tmp <- tibble(A = list(NULL, NULL, NULL), B = list(1, 2, 3)))
#> # A tibble: 3 x 2
#>   A      B        
#>   <list> <list>   
#> 1 <NULL> <dbl [1]>
#> 2 <NULL> <dbl [1]>
#> 3 <NULL> <dbl [1]>

ss <- write_sheet(tmp)
#> Creating new Sheet: "perfect-robberfly"
read_sheet(ss)
#> Reading from "perfect-robberfly"
#> Range "tmp"
#> # A tibble: 3 x 2
#>       A B    
#>   <dbl> <lgl>
#> 1     1 NA   
#> 2     2 NA   
#> 3     3 NA

(foo <- tibble(A = rep.int(NA, 3), B = list(1, 2, 3)))
#> # A tibble: 3 x 2
#>   A     B        
#>   <lgl> <list>   
#> 1 NA    <dbl [1]>
#> 2 NA    <dbl [1]>
#> 3 NA    <dbl [1]>

ss2 <- write_sheet(foo)
#> Creating new Sheet: "sure-creature"
read_sheet(ss2)
#> Reading from "sure-creature"
#> Range "foo"
#> # A tibble: 3 x 2
#>   A         B
#>   <lgl> <dbl>
#> 1 NA        1
#> 2 NA        2
#> 3 NA        3

drive_rm(ss, ss2)
#> Files deleted:
#>   * perfect-robberfly: 1XbWkRgStAQpXya0eteAQOchJ-YkpNJsbu93fn_PUDaY
#>   * sure-creature: 1E57-iRrOZLdoPYPHX7MTPXEy432q82WyzFHNz_1stms

Created on 2020-12-12 by the reprex package (v0.3.0.9001)

@jennybc
Copy link
Member

jennybc commented Dec 12, 2020

This is a more compelling example in terms of considering this a bug:

library(tidyverse)
library(googlesheets4)
library(googledrive)
# hidden auth chunk here
(tmp <- tibble(A = list(NULL, "HI", NULL), B = list(1, 2, 3)))
#> # A tibble: 3 x 2
#>   A         B        
#>   <list>    <list>   
#> 1 <NULL>    <dbl [1]>
#> 2 <chr [1]> <dbl [1]>
#> 3 <NULL>    <dbl [1]>

ss <- write_sheet(tmp)
#> Creating new Sheet: "aware-lion"
read_sheet(ss)
#> Reading from "aware-lion"
#> Range "tmp"
#> # A tibble: 3 x 2
#>   A             B
#>   <list>    <dbl>
#> 1 <dbl [1]>    NA
#> 2 <chr [1]>     2
#> 3 <dbl [1]>    NA

drive_rm(ss)
#> Files deleted:
#>   * aware-lion: 1zhjqwcF5T68Tze4WcUyAorHgPIXQBY47TUgT0IekZqk

Created on 2020-12-12 by the reprex package (v0.3.0.9001)

@jennybc jennybc changed the title Potential feature request(s) - writing NULL columns & sheet_append without changing formatting Explicit NULL cells disappear when writing Dec 12, 2020
@jennybc jennybc added the bug 🐛 an unexpected problem or unintended behavior label Dec 12, 2020
@py9mrg
Copy link
Author

py9mrg commented Dec 14, 2020

Will you separate the second point into a second issue? That makes things easier to discuss and fits better with our workflows.

Yes, no problem - I should have thought of that myself, sorry.

A more typical way to represent an empty column in R is with NAs:

Absolutely, and when I write sheets that I create myself in R first then I never use NULL but always NA. The slight "quirk" for me comes when reading in sheets from Google that other people have made, which contain empty cells. read_sheet coerces empty cells to NULL, not NA, so I either have to (a) manually correct all NULLs to NA after reading, or (b) coerce to character when reading in using col_types = 'c' and then (if I want to make use of the advice you kindly gave me in #169) I have to coerce back to list columns before using write_sheet.

Neither are a major problem, it's just a little bit more of a fiddle than is ideal and might be nicer if, when reading in empty cells, they are coerced to NA not NULL. I guess really the problem is the reading not the writing! If empty cells can be coerced to NA instead of NULL then I don't have any issues, but I guess it couldn't hurt to make it so that NULLs are coerced to NA when writing as well as a double lock. Depending how much hassle it is to implement, of course.

@jennybc
Copy link
Member

jennybc commented Dec 28, 2020

library(googledrive)
library(googlesheets4)
library(tidyverse)

# hidden auth chunk right here

(dat <- tibble(
  x = 1:3,
  chr = c("hi", NA, "bye"),
  mixed = list(NA, 1.5, "foo"),
  z = letters[1:3]
))
#> # A tibble: 3 x 4
#>       x chr   mixed     z    
#>   <int> <chr> <list>    <chr>
#> 1     1 hi    <lgl [1]> a    
#> 2     2 <NA>  <dbl [1]> b    
#> 3     3 bye   <chr [1]> c

ss <- write_sheet(dat)
#> Creating new Sheet: "semiironic-monkfish"

read_sheet(ss)
#> Reading from "semiironic-monkfish"
#> Range "dat"
#> # A tibble: 3 x 4
#>       x chr   mixed     z    
#>   <dbl> <chr> <list>    <chr>
#> 1     1 hi    <NULL>    a    
#> 2     2 <NA>  <dbl [1]> b    
#> 3     3 bye   <chr [1]> c

drive_rm(ss)
#> Files deleted:
#>   * semiironic-monkfish: 1dQ--QBgYrdtCmW5qc70jVO-JGzE8BU3zPAQOMUdxXTc

I think this must illustrate what you're talking about.

An empty cell comes in as NA versus NULL depending on context. In a column that is guessed as a regular atomic type, such as character, blank cells are NA. But if the column appears to be of mixed type and guessing leads to a list-column, blank cells that start as a length 1 logical NA are read as NULL. I don't love that asymmetry, although I'm not sure it's avoidable.

(But yes you may need to specify column type for some of these Sheets controlled by others, in general.)

@py9mrg
Copy link
Author

py9mrg commented Jan 5, 2021

Yes, that's exactly right - thanks for digging a bit more to clarify the situations when it does / doesn't happen. That helps a lot to understand. I'll probably just carry on using col_types = 'c' when reading in that case. I mean, it's often required anyway for other reasons due to all the, let's say - inconsistent - googlesheet things people can do when they're not worried about using anything other than GS! If it's not avoidable maybe a note in the documentation is enough to close the issue. Oh an happy new year.

@jennybc
Copy link
Member

jennybc commented Jan 5, 2021

googlesheets4 will gain better column type specification machinery at some point, in which case there would presumably be a way to to specify a default column type (perhaps character, for you?) and/or a way to say that you never want to guess "list-column".

jennybc added a commit that referenced this issue Jul 15, 2021
@JerePlum99
Copy link

@jennybc I came across this older issue while trying to understand the behavior of NULL values input passed into range_write.

I'm still trying to fully understand the difference between batchUpdate.updateCellsRequest vs spreadsheets.values.batchUpdate , but as range_write uses batchUpdate.updateCellsRequest is there no way to emulate the skip behavior documented for spreadsheets.values.batchUpdate. Based on that documentation, I would expect there to be a way to skip updating a cell by passing in a NULL value, rather than overwriting it as an empty cell.

See the below references, as writing a selective range and "writing a selective range" / leaving an existing column in the middle of a range exactly the same is the behavior I'm trying to achieve:

Happy to try creating a reprex if useful.

@jennybc
Copy link
Member

jennybc commented May 30, 2023

Yes, it would be best to open a new issue with a reprex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants