Skip to content

Update economics data #2962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 6, 2019
Merged

Update economics data #2962

merged 5 commits into from
Mar 6, 2019

Conversation

hadley
Copy link
Member

@hadley hadley commented Oct 25, 2018

  • Define col_types for read_csv
  • Pin time series to fix end date
  • Use usethis::use_data

Also ungroups economics-long. This was already present in the code, but not in the saved data set.

* Define col_types for read_csv
* Pin time series to fix end date
* Use usethis::use_data

Also ungroups economics-long. This was already present in the code, but not in the saved data set.
@yutannihilation
Copy link
Member

Currently, economics and economics_long seems not heavily used by CRAN packages, so this seems ok to merge:

https://github.com/search?q=org%3Acran+ggplot2+economics+extension%3AR&type=Code
https://github.com/search?q=org%3Acran+ggplot2+economics_long+extension%3AR&type=Code

Yet, I'm not sure when we can consider it's OK to update the existing data... Might it be a choice to rename the data so that the users can notice the change when they see the error "data set 'economics' not found"? (I wish there were .Deprecated() for data)

@hadley
Copy link
Member Author

hadley commented Nov 30, 2018

I think the changes are sufficiently small that this will not cause problems.

@yutannihilation
Copy link
Member

yutannihilation commented Dec 28, 2018

Here's another need to update or replace the existing economics_long; since it is old grouped_df format, we'll see the warning below after the next release of dplyr.

library(ggplot2)

ggplot(economics_long, aes(date, value01, colour = variable)) +
  geom_line()
#> Warning in grouped_indices_grouped_df_impl(.data): Detecting old grouped_df
#> format, replacing `vars` attribute by `groups`

Created on 2018-12-28 by the reprex package (v0.2.1)

This seems not so urgent. Fortunately, economics_long is not used in tests, so this doesn't break CRAN checks. After update or replace, we'll need to require dplyr (>= 0.8.0). I feel it's better to wait for a while until users get familiar with the new dplyr.

@yutannihilation
Copy link
Member

@clauswilke As you commented on #3146, what do you think about this pull request? Do you think we can have a chance to update economics data?

@clauswilke
Copy link
Member

I'm less concerned here. I think the main issue is the grouping, and in the unlikely case that somebody currently depends on the preexisting grouping they can easily add it. With the diamonds dataset, many people would suddenly have found their figures change appearance, and that would have been confusing, and also more difficult to fix.

@clauswilke
Copy link
Member

Oh, and I think the warning about old groupings is awkward and should be fixed one way or another by the next release.

@yutannihilation
Copy link
Member

Hmm, I mainly worried about the subtle changes of values, which might supprise somebody that the results of some calculation (mean, p-value, or whatever) don't match some examples in books or websites. But, in terms of visual appearances, it seems OK.

library(ggplot2)

tmp <- tempfile(fileext = ".rda")

download.file("https://raw.githubusercontent.com/tidyverse/ggplot2/master/data/economics_long.rda", destfile = tmp)
economics_long_old <- local({load(tmp); economics_long})

ggplot(economics_long_old, aes(date, value01, colour = variable)) +
  geom_line()
#> Warning: Detecting old grouped_df format, replacing `vars` attribute by
#> `groups`

download.file("https://raw.githubusercontent.com/tidyverse/ggplot2/0639906ebe36dec7694ca9e905f9fa7f903671a9/data/economics_long.rda", destfile = tmp)
economics_long_new <- local({load(tmp); economics_long})

ggplot(economics_long_new, aes(date, value01, colour = variable)) +
  geom_line()

Created on 2019-02-17 by the reprex package (v0.2.1)

@yutannihilation
Copy link
Member

Maybe I worried too much. Considering we need to fix the grouping anyway, I now feel this needs to be merged. Thanks Claus for the comment.

@clauswilke
Copy link
Member

It looks like the psavert curve has changed while the others have remained the same. Do we know why this one has changed? Was the source data revised? I would recommend to download the csv file manually and store it in the data-raw folder rather than downloading it on the fly. This also future-proofs us if the url ever stops working.

@yutannihilation
Copy link
Member

yutannihilation commented Feb 17, 2019

Yeah, I was wondering about that part... (probably what's changed was unemployed?) Will check.

@yutannihilation
Copy link
Member

yutannihilation commented Feb 17, 2019

probably what's changed was unemployed?

Sorry, you are right. psavert is different. The red line is the current, the black line is the new one below:

# Download from http://research.stlouisfed.org

library(readr)
library(dplyr, warn.conflicts = FALSE)
library(purrr)
library(tidyr)
library(dplyr)
library(ggplot2)

series <- c("PCE", "POP", "PSAVERT", "UEMPMED", "UNEMPLOY")
url <- paste0("http://research.stlouisfed.org/fred2/series/", series, "/downloaddata/", series, ".csv")

fields <- map(url, read_csv,
  col_types = cols(
    DATE = col_date(format = ""),
    VALUE = col_double()
  )
)
economics_new <- fields %>%
  map2(tolower(series), function(x, series) setNames(x, c("date", series))) %>%
  reduce(inner_join, by = "date") %>%
  filter(date <= as.Date("2015-04-01"))

data("economics", package = "ggplot2")

for (s in tolower(series)) {
  p <- ggplot(mapping = aes(date, !!sym(s))) +
    geom_line(data = economics, colour = "red") +
    geom_line(data = economics_new, colour = alpha("black", 0.6)) +
    ggtitle(s)
  
  plot(p)
}

Created on 2019-02-17 by the reprex package (v0.2.1)

@yutannihilation
Copy link
Member

Actually, psavert (personal saving rate) has been revised several times as this says so:

Over the decades, revisions to the personal saving rate have reflected many improvements to estimation methods, including the treatment of government employee retirement plans (1999 CR), revised estimates of income misreporting and new information on employee cafeteria plans (2009 CR), and the adoption of accrual-based accounting methods for defined benefit pension plans (2013 CR) as well as the incorporation of regularly scheduled source data
(https://www.bea.gov/help/faq/512)

@clauswilke
Copy link
Member

Yes, so I think the right strategy is to update now but save the original data file into the repo so there won't be any future data updates anymore.

@yutannihilation
Copy link
Member

Agreed.

@yutannihilation
Copy link
Member

@hadley Could you update this when you have time, following this Claus's suggestion? Or, if you are busy, I can take this over.

download the csv file manually and store it in the data-raw folder rather than downloading it on the fly.

@hadley
Copy link
Member Author

hadley commented Feb 22, 2019

@yutannihilation it would be wonderful if you could take this over for me. You should be able to push into this PR if you get the dev version of usethis then do pr_fetch(2962) to get this PR on to your computer, and then pr_push() to push your changes back here.

@yutannihilation
Copy link
Member

@hadley I didn't know these functions, thanks! I'll do it.

@yutannihilation
Copy link
Member

It seems the data got another revise after this PR had been created.

9b74fa2#diff-7f60de5e0b49fef3880b5838e2c50479

@yutannihilation
Copy link
Member

@hadley
Sorry for bothering you. You wrote

This leads to minor changes in the computation of PCE

but it seems the changes are in all columns. Could you confirm the changes (see #2962 (comment)) are still acceptable?

@hadley
Copy link
Member Author

hadley commented Mar 5, 2019

Yeah, it's fine.

@yutannihilation
Copy link
Member

Thanks! Then I'm merging this.

@yutannihilation yutannihilation deleted the economics branch June 18, 2019 11:42
@lock
Copy link

lock bot commented Dec 15, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Dec 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants