Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand functions added #21

Closed
wants to merge 1 commit into from

Conversation

@Quiri
Copy link

commented Aug 18, 2014

Hello Hadley,

i'm a Data Analyst from Germany and love working with your packages, especially dplyr. While using the group_by %>% summarize chain, I discovered, that there are only groups made out of existing data. But if I want to group by date, for example, and I don't have data for a specific date, then its does not appear in the grouped result, so I have a "hole", but I would like to have

Date n()
2014-01-01 5
2014-01-02 0 instead of no data at all
2014-01-03 7

and so on...

I didn't find a solution, so I build a function for it based on expand.grid(). I thought it would be a nice add to tidyr, so take a look at it, maybe you like it (or at least the idea of expanding data).

You can expand from min:max and by unique ids (which is convenient if you grouped by many variables, like date and gender), and you can select as many expandable columns as you wish.
It works great with the dplyr pipe and I use it a lot.

Feel free to contact me.

Kirill
@Quiri

This comment has been minimized.

Copy link
Author

commented Aug 19, 2014

Here is an example how expand could be used in a dplyr chain:
Assume we look at payment data and want to have a table with total payers up to a respective day.

firstpay <- data%>%
  group_by(UserId)%>% 
  filter(paytime == min(paytime))%>%           # Filter the data with the first payment per User
  group_by(date)%>%
  summarize(first_time_payer = n())%>%     # Calculate the amount of first time payers per day
  ungroup%>%                                             # Ungroup for cumsum operator
  expand("date")%>%                                   # Expand by date, since there are days without first time payers
  mutate(total_payer = cumsum(first_time_payer))  # calculate the cumulated amount payers for every date point
@matthieugomez matthieugomez referenced this pull request Sep 27, 2014

@hadley hadley closed this in c6c8112 Oct 7, 2014

@hadley

This comment has been minimized.

Copy link
Member

commented Oct 7, 2014

Thanks! I liked the idea, but ended up going in a somewhat different direction for the implementation.

@Quiri

This comment has been minimized.

Copy link
Author

commented Oct 9, 2014

Awesome! That's actually what I was hoping for ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.