-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Milestone
Description
Consider the following -
uid term_number order_id
68 1001190 0 1985608
69 1001190 0 2052320
70 1001190 0 2089064
71 1001190 1 2125056
72 1001190 2 2275108
73 1001190 2 2296768
74 1001190 2 2343148
75 1001190 3 2474898
76 1001190 4 2676880
77 1001190 5 2718370
78 1001190 6 NA
79 1001190 7 3109466
80 1001190 7 3132486
mydf %.%
group_by(uid, term_number) %.%
summarize(n_distinct(order_id))
Source: local data frame [8 x 3]
Groups: uid
uid term_number n_distinct(order_id)
1 1001190 0 3
2 1001190 1 1
3 1001190 2 3
4 1001190 3 1
5 1001190 4 1
6 1001190 5 1
7 1001190 6 1
8 1001190 7 2
This says 1 order for term 6 where it should be 0. This is because n_distinct does not ignore nulls. I guess it makes sense in some cases, so ideally a flag to ignore nulls would be useful.
PS, Most databases will ignore null by default. R's distinct will not.
Thanks,
Metadata
Metadata
Assignees
Labels
No labels