## Slicing with pandas, gurobipy.tuplelist, and O(n) slicing

Run a little python script that sets up the performance comparisons.

In [1]:
run prep_for_different_slicings.py

The slicing will be over small, medium, and large tables.

In [2]:
[len(getattr(td, "childTable")) for td in (smallTd, medTd, bigTd)]

[1200, 31800, 270000]

We will run three series of three tests each.

Each series tests (1) slicing with `.sloc` and `pandas`, (2) slicing with `gurobipy.tuplelist` (3) O(n) slicing 

First, we see that with a small table (1,200) rows, the `pandas` slicing is only somewhat faster than the O(n) slicing, while the `tuplelist` slicing is quite a bit faster.

In [3]:
%timeit checkChildDfLen(smallChildDf, *smallChk)

1000 loops, best of 3: 1.6 ms per loop


In [4]:
%timeit checkTupleListLen(smallSmartTupleList, *smallChk)

100000 loops, best of 3: 5.17 us per loop


In [5]:
%timeit checkTupleListLen(smallDumbTupleList, *smallChk)

100 loops, best of 3: 5.85 ms per loop


Next we see that with a table of 31,800 rows, `pandas` slicing is now ~100 faster than O(n) slicing (but `tuplelist` is still the fastest by far).

In [6]:
%timeit checkChildDfLen(medChildDf, *medChk)

1000 loops, best of 3: 2.04 ms per loop


In [7]:
%timeit checkTupleListLen(medSmartTupleList, *medChk)

100000 loops, best of 3: 5.43 us per loop


In [8]:
%timeit checkTupleListLen(medDumbTupleList, *medChk)

1 loops, best of 3: 183 ms per loop


Finally, we see that with a table of 270,000 rows, `pandas` slicing is ~1000X faster than O(n) slicing (which is comparable to the improvement `tuplelist` shows over `.sloc`). 

In [9]:
%timeit checkChildDfLen(bigChildDf, *bigChk)

100 loops, best of 3: 4.49 ms per loop


In [10]:
%timeit checkTupleListLen(bigSmartTupleList, *bigChk)

100000 loops, best of 3: 5.51 us per loop


In [11]:
%timeit checkTupleListLen(bigDumbTupleList, *bigChk)

1 loops, best of 3: 1.53 s per loop


Bottom line? `pandas` isn't really designed with "iterating over indicies and slicing" in mind, so it isn't the absolutely fastest way to write this sort of code. However, `pandas` also doesn't implement naive O(n) slicing. 

For most instances, the `.sloc` approach to slicing will be fast enough. In general, so long as you use the optimal big-O subroutines, the time to solve a MIP or LP model will be larger than the time to formulate the model.  However, in those instances where the slicing is the bottleneck operation, a `tuplelist` can be used, or the model building code can be refactored to be more pandonic.  

#### Addendum

There was a request to check `sum` as well as `len`. Here you go.

In [12]:
%timeit checkChildDfSum(bigChildDf, *bigChk)

100 loops, best of 3: 4.71 ms per loop


In [13]:
%timeit checkTupleListSum(bigSmartTupleList, bigTd, *bigChk)

100 loops, best of 3: 3.37 ms per loop


In [14]:
%timeit checkTupleListSum(bigDumbTupleList, bigTd, *bigChk)

1 loops, best of 3: 1.51 s per loop
