## Partition Pandas Dataframe based on sentinel value

First create some sample data - simulate what is retrieved from database

In [18]:
import random, datetime
import pandas

d = {'timestamp': [], 'motor_rpm': []}
now = datetime.datetime.utcnow()
for i in range(1000):
    rpm = random.randrange(100,1500, 10)
    d['timestamp'].append(now)
    d['motor_rpm'].append(rpm)
    now += datetime.timedelta(minutes=1)
    
# Insert 5 "events" where motor rpm goes to zero
# An "event" is where we see 1 or more contiguous readings of zero rpm.
# i.e., if the motor stops - it could be down for a minute or several minutes
zero_rpm_indices = [3,4,
                   299,
                   466, 467, 468, 469,
                   700, 701,
                   950, 951, 952, 953, 954]
for idx in zero_rpm_indices:
    d['motor_rpm'][idx] = 0

df = pandas.DataFrame.from_dict(d)
df

Unnamed: 0,timestamp,motor_rpm
0,2021-03-03 00:55:33.906756,930
1,2021-03-03 00:56:33.906756,260
2,2021-03-03 00:57:33.906756,900
3,2021-03-03 00:58:33.906756,0
4,2021-03-03 00:59:33.906756,0
...,...,...
995,2021-03-03 17:30:33.906756,1380
996,2021-03-03 17:31:33.906756,990
997,2021-03-03 17:32:33.906756,890
998,2021-03-03 17:33:33.906756,1290


## Partitioning

Now that we have a reasonable facsimile of the sensor data, the problem is identify partition boundaries.

The 0 values in `motor_rpm` are our sentinels - but note we may have a continguous series of the sentinel value/

I arbitrarily define a partition as:
  1. Starting at the first non-zero RPM value following a 0 rpm record. In the case the first record is non-zero RPM, that will also begin a partition
  2. Ending at the last occurrence of a zero RPM value before a non-zero RPM value. In all cases, the last record ends the last partition.
  
 

In [20]:
# Step 1: Find the zero-rpm events
zero_rpm = df.loc[df['motor_rpm'] == 0]
zero_rpm

Unnamed: 0,timestamp,motor_rpm
3,2021-03-03 00:58:33.906756,0
4,2021-03-03 00:59:33.906756,0
299,2021-03-03 05:54:33.906756,0
466,2021-03-03 08:41:33.906756,0
467,2021-03-03 08:42:33.906756,0
468,2021-03-03 08:43:33.906756,0
469,2021-03-03 08:44:33.906756,0
700,2021-03-03 12:35:33.906756,0
701,2021-03-03 12:36:33.906756,0
950,2021-03-03 16:45:33.906756,0


In [22]:
# Step 2: Create a map of indexes to zero-rpm records
# i.e., we'll see what partition each zero-rpm event 'belongs' to.
m = zero_rpm.index.to_series().diff().ne(1).cumsum()
m

3      1
4      1
299    2
466    3
467    3
468    3
469    3
700    4
701    4
950    5
951    5
952    5
953    5
954    5
dtype: int64

In [24]:
# Step 3: Walk the map elements and use that to definepart partition start, stop tuples
partitions = []
current_event = 1
current_start_idx = 0
previous_event_idx = 0
for idx, event in m.items():
    if event != current_event:
        partitions.append((current_start_idx, previous_event_idx))
        current_start_idx = previous_event_idx + 1
        current_event = event
    previous_event_idx = idx
    
# The last event falls outside the loop

# Finally, all records at the end won't have a power event to stop at

partitions


[(0, 4), (5, 299), (300, 469), (470, 701)]