New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving and loading from RDS/RDA causes error #181
Comments
In the mean time, I wrote the following wrapper functions to get around the problem.
The reason to check for a list, is that saving a list of xml objects results in the same problem. Using lists of xml object is a very common pattern and should be supported. One cannot just call |
I ran into yet more problems. The The error also crashes RStudio. See bug report on RStudio forum. -- To make things more complicated, I found that the xml2 package which rvest wraps, has a function What a frustrating issue. |
I'm running into the same problems. |
I bit the bullet and now keep folders around with the |
I saved a list of xml nodes and when I tried to load it back to RStudio I encountered the error of invalid pointers. Is there a chance to load it back with no need of scraping a web sites again? You give a walkaround solution of how to store a list properly, but I'd would like to avoid re-scraping. |
The xml is stored in memory with external pointers, which R does not store in Rds files, so you cannot simply save the R object. The easiest workaround is to use |
Yes, but it means I have to re-scrape the pages again :( Thanks for the answer, since now I'll be much more careful with tasks like this. |
Often you don't want to scrape a large site more than once. When so, it is useful to save the scraped content to a file. The obvious choice here is the RDS format since it should reproduce files exactly. However, when saving and loading a file to disk, the objects are not identical, and the object loaded from the disk crashes R and RStudio as well. The same error happens with RDA format. It happens no matter if one uses the base functions or the readr functions.
I was unable to find any other way to serialize complex R objects. I tried feather, jsonlite and
base::dput
. From looking at the output ofdput
, it seems that the error may be related to rvest not using copy semantics. Thedput
output is very short and contains references to pointers, which I guess are RAM locations.One person on Stackoverflow found a workaround by explicitly converting it to a character. Perhaps introduce a special function for writing/reading rvest objects. One can then add a detector for rvest objects in the readr package, so that one can get write_rds etc. to work using this workaround.
The text was updated successfully, but these errors were encountered: