Make efficient C++ versions of window functions #133

hadley · 2013-11-25T17:47:01Z

Loosely grouped below

The text was updated successfully, but these errors were encountered:

romainfrancois · 2014-04-04T11:56:54Z

I've put some code for first (last will follow). I don't have the full argument matching at my disposal, so I'm handling the arguments this way:

the first is always considered to be the variable. I just check that the argument is either unnamed or called 'x'
for the other arguments, they must be named. I don't do position matching if not, but partial matching is allowed.

That's a bit more code than I anticipated for the implementation of something conceptually simple. The advantage over calling the R version of first is that there is no materialisation of the data. i.e:

> df <- data.frame( x = 1:16, g = rep(1:4, each = 4), y = 16:1 )
> df
    x g  y
1   1 1 16
2   2 1 15
3   3 1 14
4   4 1 13
5   5 2 12
6   6 2 11
7   7 2 10
8   8 2  9
9   9 3  8
10 10 3  7
11 11 3  6
12 12 3  5
13 13 4  4
14 14 4  3
15 15 4  2
16 16 4  1
> df %.% group_by(g) %.% summarise( first_x = first(x) )
Source: local data frame [4 x 2]

  g first_x
1 1       1
2 2       5
3 3       9
4 4      13
> df %.% group_by(g) %.% summarise( first_x = first(x, order_by = y) )
Source: local data frame [4 x 2]

  g first_x
1 1       4
2 2       8
3 3      12
4 4      16

In each case, I don't have to materialise the 4 vectors to pass them to the R function first. n the easy case first(x) I just pick the first one from the data virtually indexed by the indices.

For the case first(x, order_by = y) I loop around the y variable to find the smallest, but at no point I am materialising either x or y.

last should be straightforward, and nth should not be too hard.

hadley · 2014-04-07T14:04:35Z

Nice, thanks Romain.

romainfrancois · 2014-10-03T21:01:18Z

The code for lead and lag was already there but not enabled. I've fixed it a bit and enabled. Although for now it does not support the full arguments of their R counterparts. They only handle the 2 args form, i.e. they don't handle default or order_by. If the call is anything more than 2 args, it just falls back to R eval.

I think I can handle default easily.

Then for order_by I can borrow some code from first ...

I'll handle the case order_by = symbol hybridly as it's the main use case I guess, and then anything else will fall back to R as it's more complicated to handle.

hadley · 2015-10-22T21:23:29Z

Is this finished?

ghost assigned romainfrancois Nov 25, 2013

romainfrancois added a commit that referenced this issue Nov 27, 2013

preliminary code (not yet included) for row_number. #133

154d4db

romainfrancois mentioned this issue Dec 23, 2013

lead/lag don't preserve factors #166

Closed

romainfrancois added a commit that referenced this issue Dec 23, 2013

initial lead implementation. #133

2d50cf6

romainfrancois added a commit that referenced this issue Dec 24, 2013

hybrid impl of cumsum, plus generic impl of Mutater using CRTP. #133

79688cb

romainfrancois added a commit that referenced this issue Dec 24, 2013

hybrid impl of cumsum, plus generic impl of Mutater using CRTP. #133

a9a4258

romainfrancois added a commit that referenced this issue Dec 24, 2013

cummax. #133

802071d

hadley modified the milestones: 0.3, v0.2 Mar 17, 2014

romainfrancois added a commit that referenced this issue Apr 4, 2014

last(.) and last(., order_by = . ). #133

9c504f1

romainfrancois added a commit that referenced this issue Apr 4, 2014

added hybrid nth using c++ nth_element. #133

4d2f1d3

romainfrancois added a commit that referenced this issue Apr 4, 2014

percent_rank. #133

27c0a6c

romainfrancois added a commit that referenced this issue Apr 4, 2014

ntile. #133

2577602

romainfrancois added a commit that referenced this issue Apr 4, 2014

adapt Rank_Impl with pre_increment so that we can have cume_dist. #133

6100ce4

hadley modified the milestones: 0.3.1, 0.3 Sep 11, 2014

romainfrancois added a commit that referenced this issue Oct 3, 2014

enable simple version (one arg only) of lag. #133

ede309b

romainfrancois added a commit that referenced this issue Oct 3, 2014

additional tests for lead and lag #133

246bab7

hadley modified the milestones: 0.4, 0.3.1 Oct 30, 2014

acthomasca mentioned this issue Jun 13, 2015

segfaulting problem on Ubuntu Linux, again #952

Closed

hadley closed this as completed Mar 1, 2016

lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make efficient C++ versions of window functions #133

Make efficient C++ versions of window functions #133

hadley commented Nov 25, 2013

romainfrancois commented Apr 4, 2014

hadley commented Apr 7, 2014

romainfrancois commented Oct 3, 2014

hadley commented Oct 22, 2015

Make efficient C++ versions of window functions #133

Make efficient C++ versions of window functions #133

Comments

hadley commented Nov 25, 2013

romainfrancois commented Apr 4, 2014

hadley commented Apr 7, 2014

romainfrancois commented Oct 3, 2014

hadley commented Oct 22, 2015