# Race Logs

This notebook is used to view, update, and manage the set of race/sail instrument logs collected on Peer Gynt using the Raspberry Pi.

Information about the race logs are collected into a Pandas dataframe (which is much like a database).  Why not use a database?  Because then we can share "know how" and some tools.

## Goals

- View the list of all logs and inspect for accuracy.
    - Correct errors
- Update and add a new log
- Add meta data (like race start/end)
- View race data.

## TODO

- Add additional metadata for each race.  Some extracted automatically?
   - Tenet: this data should be user entered, not automatic.  Automatic goes in a separate table?
   - Conditions.  Crew.  Settings for rig.
   - Speed. Quality of maneuvers.
- How to edit more complex and longer text fields.
- How to handle multiple races in one log??
   - Split into different files?
- Make it faster to show the race track.
- What if I want to permanently delete a log?  
- How can I tell if the log is a duplicate?


In [59]:
%matplotlib notebook

import os
import itertools as it
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import qgrid

# These are libraries written for RaceAnalysis
import global_variables
G = global_variables.init_seattle()
import race_logs
import process as p
import analysis as a
import chart as c
import utils

In [60]:
# Info about the race logs are stored in a DataFrame.
log_info = race_logs.read_log_info()

# The data in this table can be editted using a QGrid Control.  Click on the column header to sort.  Click again 
# to sort in a different order.  Double click on a cell to edit.
w = qgrid.show_grid(log_info, show_toolbar=True)
w


QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

In [61]:
# If you do update the table shown above, then this will save the changes (which are not saved by default)

if False:  # False for now.
    log_info = w.get_changed_df()
    race_logs.save_updated_log_info(log_info)

In [62]:
# Check to see if there are newly collected logs.  The log files will exist, but there will be 
# no corresponding rows.

missing_files = race_logs.find_new_logs(log_info)
print("Here are a list of files which are missing from the log_info table.")
missing_files

Here are a list of files which are missing from the log_info table.


['2020-04-08_10:49.pd.gz']

In [13]:
import importlib
importlib.reload(race_logs)

<module 'race_logs' from '/Users/viola/GDriveBV/Sailboat/Code/Python/sailing/race_logs.py'>

In [63]:
# Load each of the new log files.

if len(missing_files) > 0:
    # Load these new log files
    new_dfs = []
    for file in missing_files:
        print(f"Loading {file}")
        ndf = race_logs.read_log_file(file, discard_columns=True, skip_dock_only=False, trim=True, cutoff=0.3)
        ndf.filename = file
        new_dfs.append(ndf)

    # As a convenience combine the new logs into one large DataFrame
    bdf =  pd.concat(new_dfs, sort=True, ignore_index=True)

Loading 2020-04-08_10:49.pd.gz
Session from 2020-04-08 17:49:21.050000, 154618 rows, 4.294722222222222 hours.


In [64]:
# Display each new race log on a map, to jog your memory.

if len(missing_files) > 0:

    # Create a chart that can contain all the tracks.
    chart = c.plot_chart(bdf)

    # Plot each in a different color.
    for df, color in zip(new_dfs, it.cycle("red green blue brown grey".split())):
        print(f"Displaying in {df.filename} in {color}")
        c.draw_track(df, chart, color=color)


<IPython.core.display.Javascript object>

Displaying in 2020-04-08_10:49.pd.gz in red


## Trimming the data 

The logs start from the time we power up until we shutdown.  And this typically inclues 30-90 mins at the dock (or more).

The UI below (which is sort of unreliable right now) can be used to find the trim points.

On the left are two "sliders" (primitive, I know).  The first is used to determine the beginning of the data to show.  The second the end.  When you are done, the results are stored in `ch.begin` and `ch.end`.

Note, for some reason the UI freezes.  If so,  you can just re-run the command.  

In [66]:
df = new_dfs[0]
print(f"Displaying in {df.filename}")
ch = c.plot_track(df)

Displaying in 2020-04-08_10:49.pd.gz


<IPython.core.display.Javascript object>

In [67]:
ch.begin, ch.end

(40409, 130577)

In [68]:
# Display the newly modified table, and perhpas edit to fill in missing info.

files_to_add = missing_files
# files_to_add = missing_files[-1:]
files_to_add

['2020-04-08_10:49.pd.gz']

In [69]:
new_rows = []

if len(files_to_add) > 0:
    for file in files_to_add:
        print(f"Adding {file}")
        new_rows.append(race_logs.loginfo_new_row(file))

    new_log_info = log_info.append(new_rows, ignore_index=True)    
   
    # The data in this table can be editted using a QGrid Control.  Double click on a cell to edit.
    w = qgrid.show_grid(new_log_info, show_toolbar=True)
    display(w)

Adding 2020-04-08_10:49.pd.gz


QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

In [71]:
# Save the new table.
if len(missing_files) > 0:

    log_info = w.get_changed_df()
    race_logs.save_updated_log_info(log_info)


Backed up log info data to Data/Backup/log_info.pd_00


In [72]:
log_info

Unnamed: 0,file,race,begin,end,datetime,description
0,2019-10-04_16:43.pd.gz,,0,-1,2019-10-04 16:43:00-07:00,Sail to Everett for Foulweather Bluff Race.
1,2019-10-05_09:18.pd.gz,Foulweather Bluff Race,36051,258396,2019-10-05 09:18:00-07:00,Foulweather Bluff Race
2,2019-10-11_17:30.pd.gz,,0,-1,2019-10-11 17:30:00-07:00,
3,2019-10-11_17:38.pd.gz,,0,-1,2019-10-11 17:38:00-07:00,"Short practice, upwind tacks and downwind jibes."
4,2019-10-12_09:45.pd.gz,CYC PSSC Day 1,19081,233893,2019-10-12 09:45:00-07:00,CYC PSSC Day 1
5,2019-10-18_13:51.pd.gz,,0,-1,2019-10-18 13:51:00-07:00,"Short, at dock."
6,2019-10-19_09:45.pd.gz,STYC Fall Regatta,0,-1,2019-10-19 09:45:00-07:00,STYC Fall Regatta.
7,2019-10-26_09:40.pd.gz,Grand Prix Saturday,40503,87408,2019-10-26 09:40:00-07:00,Grand Prix Saturday.
8,2019-10-26_12:35.pd.gz,,0,-1,2019-10-26 12:35:00-07:00,"Short, at dock."
9,2019-11-07_12:46.pd.gz,,0,-1,2019-11-07 12:46:00-08:00,"Short, at dock."


## Quick Visualization Interface

Below we have added a bit of additional functionality to the qgrid interface:  When you select a row, that race track will be shown automatically.

Note, it takes a second (or two) between selecting a row and the display.  Its one of the only things that are a bit slow.

In [58]:
# create a function that is called "back" when a row is selected
def show_helper(row_num):
    "Display the race track from the ROW_NUM row, by position."
    file = log_info.iloc[row_num].file
    print(f"displaying file: {file}")
    df = race_logs.read_log_file(file, discard_columns=True, skip_dock_only=False, trim=True, cutoff=0.3)
    chart = c.plot_chart(df, fig_or_num=fig)
    c.draw_track(df, chart, color='red')  

def show(args, _):
    # Args are a bit obscure
    row_num = args['new'][0]  # The newly selected row numbers, selected the first
    show_helper(row_num)

fig = plt.figure()
w = qgrid.show_grid(log_info, show_toolbar=True)
display(w)

# Display one of the races.
show_helper(0)

# Bind the callback
w.on('selection_changed', show)


<IPython.core.display.Javascript object>

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

displaying file: 2019-10-04_16:43.pd.gz
Session from 2019-10-04 23:43:44.050000, 78843 rows, 2.19 hours.
displaying file: 2020-04-04_10:03.pd.gz
Session from 2020-04-04 17:03:21.050000, 142349 rows, 3.953888888888889 hours.
