Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand functions added #21

Closed
wants to merge 1 commit into from
Closed

expand functions added #21

wants to merge 1 commit into from

Conversation

Quiri
Copy link

@Quiri Quiri commented Aug 18, 2014

Hello Hadley,

i'm a Data Analyst from Germany and love working with your packages, especially dplyr. While using the group_by %>% summarize chain, I discovered, that there are only groups made out of existing data. But if I want to group by date, for example, and I don't have data for a specific date, then its does not appear in the grouped result, so I have a "hole", but I would like to have

Date n()
2014-01-01 5
2014-01-02 0 instead of no data at all
2014-01-03 7

and so on...

I didn't find a solution, so I build a function for it based on expand.grid(). I thought it would be a nice add to tidyr, so take a look at it, maybe you like it (or at least the idea of expanding data).

You can expand from min:max and by unique ids (which is convenient if you grouped by many variables, like date and gender), and you can select as many expandable columns as you wish.
It works great with the dplyr pipe and I use it a lot.

Feel free to contact me.

@Quiri
Copy link
Author

Quiri commented Aug 19, 2014

Here is an example how expand could be used in a dplyr chain:
Assume we look at payment data and want to have a table with total payers up to a respective day.

firstpay <- data%>%
  group_by(UserId)%>% 
  filter(paytime == min(paytime))%>%           # Filter the data with the first payment per User
  group_by(date)%>%
  summarize(first_time_payer = n())%>%     # Calculate the amount of first time payers per day
  ungroup%>%                                             # Ungroup for cumsum operator
  expand("date")%>%                                   # Expand by date, since there are days without first time payers
  mutate(total_payer = cumsum(first_time_payer))  # calculate the cumulated amount payers for every date point

@matthieugomez matthieugomez mentioned this pull request Sep 27, 2014
@hadley hadley closed this in c6c8112 Oct 7, 2014
@hadley
Copy link
Member

hadley commented Oct 7, 2014

Thanks! I liked the idea, but ended up going in a somewhat different direction for the implementation.

@Quiri
Copy link
Author

Quiri commented Oct 9, 2014

Awesome! That's actually what I was hoping for ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants