Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing and importing an ssn object alters variable names #2

Closed
matthewrfuller opened this issue Nov 9, 2023 · 3 comments
Closed
Assignees
Labels
question Further information is requested

Comments

@matthewrfuller
Copy link

Hello! I've noticed that writing an ssn object using ssn_write() and then importing it using ssn_import() results in modified variable names. Below is a minimal reproducible example using the SSN2 package's Middle Fork ssn object.

`library(SSN2)

copy_lsn_to_temp()

temp_path <- paste0(tempdir(),'/MiddleFork04.ssn')
mf04p <- ssn_import(
path = temp_path,
predpts = c("pred1km", "CapeHorn", "Knapp"),
overwrite = TRUE
)

ssn_write(mf04p, path = paste0(getwd(),"/mf04p_out.ssn"), overwrite = TRUE)

mf04p_in <- ssn_import(path = paste0(getwd(),"/mf04p_out.ssn"), predpts = c("pred1km", "CapeHorn", "Knapp"))

summary(mf04p) # see variable names from original ssn object
summary(mf04p_in) # see modified variable names from original ssn object
`

The summaries of each ssn object show how the variable names have change between the original mf04p ssn object to the written/imported mf04p_in ssn object. It doesn't appear to be just a simple character length issue for shapefile .dbf fields when writing to a new .ssn. Additionally, writing/importing also appears to add a new 'ntgmtry' field to each obs/preds entry in the ssn object that is a duplicate of the 'netgeometry' field.

@pet221 pet221 self-assigned this Nov 10, 2023
@pet221 pet221 added the question Further information is requested label Nov 10, 2023
@pet221
Copy link
Collaborator

pet221 commented Nov 10, 2023

@matthewrfuller Thanks for finding and reporting this issue so quickly. It looks like st_write() is abbreviating the column names because the "netgeometry" column name has more than 10 characters, which is the maximum allowed in a dbf file. Interesting that it also truncates column names that are within the character limit... I've updated ssn_write(), ssn_subset() and ssn_split_predpts() to remove netgeometry before writing to shapefile, which addresses the issue within SSN2. @michaeldumelle - we need to decide whether this fix is sufficient or whether we should shorten the netgeometry column name to conform with those standards: netgeom? n_geometry?

michaeldumelle added a commit that referenced this issue Nov 16, 2023
…ceeding the 10 character limit for column/field names while writing to shapefiles (#2)
@michaeldumelle
Copy link
Collaborator

Thanks @matthewrfuller. We changed the name of netgeometry to netgeom to avoid exceeding the 10 character limit for column/field names while writing to shapefiles. This fix is available in the development version now (remotes::install_github("USEPA/SSN2", ref = "develop")) and will included be in the next CRAN release.

@matthewrfuller
Copy link
Author

Excellent! Thanks for implementing this change in SSN2 so quickly!

I'm wondering if a note in the ssn_write() function documentation should warn/advise users to maintain column/field names at 10 characters or less in both observation and prediction data frames if they'd like to maintain field/column names through the entire write/import process. Otherwise, if just one field exceeds the 10-character limit when writing the SSN object, the ESRI Shapefile driver used by st_write() will abbreviate all column/field names with 8 or more characters. Here's a reprex that demonstrates this behavior after adding a column with an 11-character name (DRAINAGEKM2) to the Introduction vignette's mf04p ssn object observations.

`remotes::install_github("USEPA/SSN2", ref = "develop")
library(SSN2)

copy_lsn_to_temp()

temp_path <- paste0(tempdir(),'/MiddleFork04.ssn')
mf04p <- ssn_import(
path = temp_path,
predpts = c("pred1km", "CapeHorn", "Knapp"),
overwrite = TRUE
)

obs_df <- ssn_get_data(mf04p, "obs") |>
dplyr::mutate(DRAINAGEKM2 = CDRAINAG) |> # adding 11-character field/column
dplyr::select(everything(),netgeom, DRAINAGEKM2, geometry) # organize for later comparison with written/imported ssn object

mf04p_mod <- ssn_put_data(obs_df, mf04p,"obs")

ssn_write(mf04p_mod, path = paste0(getwd(),"/mf04p_out.ssn"), overwrite = TRUE)

mf04p_in <- ssn_import(path = paste0(getwd(),"/mf04p_out.ssn"),
predpts = c("pred1km", "CapeHorn", "Knapp"))

data.frame(mf04p_mod = names(mf04p_mod$obs),
mf04p_in = names(mf04p_in$obs)) |>
dplyr::mutate(nchar_mod = nchar(mf04p_mod),
nchar_in = nchar(mf04p_in))
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants