# Regatta Analysis

Welcome to the Regatta Analysis Web "Book".  The goal of this project is to capture data from a sailboat during a race, or practice, so that we can go back later to understand what happened.  Particularly:

- What were the conditions?
- How did they change during the race?
- How fast were we sailing?
- What angles were we sailing?
- Could we have gone faster?  How?
- Were our maneuvers efficient?  Where our tacks too fast, too slow, or just right?
- Could we have made better strategic decisions? E.G. Based on current and conditions.
- Could we have made better tactical decisions? E.G. Based on wind shifts.
- Were weather and current predictions accurate?

One of my sailing heros is Arvel Gentry [LINK](http://www.gentrysailing.com/) (along with [Paul Cayard](https://en.wikipedia.org/wiki/Paul_Cayard) and [Frank Bethwaite](https://en.wikipedia.org/wiki/Frank_Bethwaite)).
Arvel was an aerodynamics engineer and avid racer in San Diego (he seems to have had a big impact on the North Sails leaders as well).  It's possible that Arvel was the first to collect quantitative performance data from his boat during races in 1974! [LINK](http://www.gentrysailing.com/pdf-theory/Are-You-at-Optimum-Trim.pdf).

![title](Data/Images/gentry_data_recorder.png)

Using this primitive device Arvel recorded boat speed and apparent wind speed.  And by adding notes during a practice run, he collected apparent wind angle and other conditions.  

**Our goals are the same.  Record data so that we can better understand what we did, and how we can do better.**

## Table of Contents

The content in this project is split across multiple Jupter notebooks with associated python libraries as well.  Each notebook introduces a single concept that is valuable in the analysis of race data.

- This notebook will give a general overview of the data we collect, provides some examples of how that data can be viewed.

- [Race Logs](Race_Logs.ipynb) Describes our framework for organizing information about the logs captured on multiple days during multiple Regattas.  I also use this notebook to keep the table of info up to date.

- [Capturing Data from the Boat Using Canboat](Canboat_Datacapture.ipynb) Discusses how data is captured, transferred, processed, and then loaded into Python/Pandas.

- [Boat Instruments](Boat_Instruments.ipynb) 

- And many more.

## Python, Jupyter, Pandas, and Regatta Analysis

This "book" is written using Python, [Jupyter notebooks](https://jupyter.org/), and [Pandas](https://pandas.pydata.org/)

- Python is a powerful programming language that is also easy to use.  It is great for data analysis and visualization.  It has tremendous online support and huge set of useful libraries.  (Note, all programmers have their favorite languages, but Python is a super safe compromise.  No one wastes their time by learning Python!)

- A Jupyter notebook is a live web page that includes running Python code and supports data analysis with visualization
   - To quote: The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

- Pandas is a python library that includes great tools for data analysis (though I find its design undisciplined).
   - To quote: **pandas** is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language"

The central object in Pandas is the DataFrame.  Its a table of data, with rows and columns.  After we are done with a bunch of massaging, the data from the boat will end up as a Dataframe.

### Caveats

Jupyter is not the greatest way to make an interactive website.  It creates websites, sure, but there are specific interactive and dynamic javascript tools that might be better.  Why Jupyter?  Becasue all/most the code that is used to display and manipulate the data is right there in front of you.  Fancier, more responsive sites, require a lot of invisible programming (in Javascipt, etc) that would make it harder to customize and explore.  *Jupyter gives you some ability to explore, but it also gives you the ability to generate and modify.*

## Some Examples

An example is worth a 1000 words.  Below is the type of data that we hope to get from the boat (though it is simplified from real data).

In [1]:
# Import some Python libraries.  This will become familiar,  but for now just assume its necessary.
%matplotlib notebook

import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import qgrid

# These are libraries written for RaceAnalysis
import global_variables
G = global_variables.init_seattle()
import race_logs
import process as p
import analysis as a
import chart as c

In [2]:
# Read some example data and load it into a DataFrame
df = pd.read_pickle(os.path.join(G.DATA_DIRECTORY, "basic_example.pd"))
# Display the DataFrame
df

Unnamed: 0,row_times,latitude,longitude,awa,aws,hdg,spd,sog,cog
1005,10:25:08 AM,47.683,-122.405,-85.837,0.854,279.04,1.437,1.551,290.598
1006,10:25:09 AM,47.683,-122.405,-86.364,0.892,279.065,1.438,1.555,290.655
1007,10:25:09 AM,47.683,-122.405,-86.861,0.927,279.087,1.44,1.558,290.731
1008,10:25:09 AM,47.683,-122.405,-87.321,0.958,279.105,1.441,1.561,290.825
1009,10:25:09 AM,47.683,-122.405,-87.741,0.986,279.121,1.442,1.565,290.934
1010,10:25:09 AM,47.683,-122.405,-88.115,1.01,279.134,1.444,1.568,291.056
1011,10:25:09 AM,47.683,-122.405,-88.439,1.028,279.143,1.445,1.572,291.188
1012,10:25:09 AM,47.683,-122.405,-88.707,1.041,279.151,1.447,1.575,291.327
1013,10:25:09 AM,47.683,-122.405,-88.915,1.049,279.155,1.449,1.579,291.471
1014,10:25:09 AM,47.683,-122.405,-89.057,1.051,279.157,1.451,1.582,291.618


## Each Row Tells the Story

The the table above, row contains the "current" value for each instrument.  Each row describes the instantaneous state of the boat, and it removes one of the more complex issues in the analysis of boat data.  On the boat, each instrument is separate and it sends out updates at a frequent, but not synchronous, rate.  

In other words, boat speed (SPD) is measured with a paddle wheel in the hull, and the values are sent asynchronously from the apparent wind angle (AWA) which is measured with a wind vane at the mast head.  Some instruments send rapid updates and others infrequent updates.  The onboard GPS sends full updates once per second (with GPS time and number of satellites, etc) and rapid updates 10x a second (only containing lat/lon).

The data processing pipeline will reorganize this asynchronous data into a single table, which is much more easily interpreted and analyzed.

## Glossary

There are some (mostly) standard names for instruments on the boat.  Here is a quick glossary that may be helpful if these are unfamiliar.

- AWA: apparent wind angle, the angle of the wind blowing at the top the mast (fast but noisy)
- AWS: apparent wind speed, the speed at the mast head (fast and noisy)
- SPD: boat speed **through the water** measured with the paddle wheel speedo in the hull (fast and noisy)
- HDG: compass heading (on PG this is **magnetic northa and not true north**, though easily corrected using magnetic variation/declination).
- TWS: true wind speed, the speed of the wind over the ground (computed from the above quantities using the "wind triangle").
- TWD: true wind direction, the angle of the wind blowing over the ground (see "wind triangle").
- TWA: true wind angle, the angle of the wind over the ground reported relative the orientation of the boat (same)
- COG and SOG: course and speed over ground from the GPS (these are relative to true north not magnetic on PG).

## Loading an Entire Race

The DataFrame above is super brief, and it shows just a few rows and a subset of the columns.  Below we will load an entire day on the water.

In [3]:
# Info about all race logs are stored in a DataFrame.
log_info = race_logs.read_log_info()

# The data in this table can be editted using a QGrid Control.  Click on the column header to sort.  Click again 
# to sort in a different order.  Double click on a cell to edit.
w = qgrid.show_grid(log_info, show_toolbar=True)
display(w)

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

In [4]:
# We can use fancy Pandas techniques to find one of the logs

# does the filename start with?
match = log_info.file.str.startswith("2019-11-16")

# This returns a set of bools
print(list(match))

[False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False]


In [5]:
# Grab the first matching exmample
example = log_info[match].iloc[0]
example

file              2019-11-16_10:09.pd.gz
race                         Snowbird #1
begin                              41076
end                               111668
datetime       2019-11-16 10:09:00-08:00
description                 Snowbird #1.
Name: 7, dtype: object

In [6]:
import importlib
importlib.reload(c)

<module 'chart' from '/Users/viola/GDriveBV/Sailboat/Code/Python/sailing/chart.py'>

In [7]:
df = race_logs.read_log_file(example.file, discard_columns=True, skip_dock_only=False, trim=True, 
                            cutoff=0.3)

# Trim off the uninteresting pre/post race bits
df = df.loc[example.begin : example.end]

# Draw the track on a map
chart = c.plot_chart(df)
c.draw_track(df, chart, color='green')

Session from 2019-11-16 18:09:15.020000, 128865 rows, 3.5797222222222222 hours.


<IPython.core.display.Javascript object>

In [8]:
# Display a bit of the table (note, the notebook will only show a few of the rows and 
# columns, notice the "..." which appear)
df

Unnamed: 0,variation,rudder,rhdg,raws,rawa,turn_rate,rsog,row_seconds,latitude,longitude,...,awa,aws,twa,tws,twd,spd,hdg,sog,cog,row_times
41076,15.200,22.800,237.200,3.870,-58.200,5.993,2.020,4107.600,47.685,-122.409,...,-51.144,3.907,-73.430,2.825,177.599,1.912,237.041,1.983,232.006,2019-11-16 11:17:43.646937376-08:00
41077,15.200,22.700,237.500,3.870,-58.200,6.174,2.020,4107.700,47.685,-122.409,...,-51.449,3.902,-74.091,2.826,177.560,1.911,237.673,1.987,232.600,2019-11-16 11:17:43.746757444-08:00
41078,15.200,22.600,238.100,3.870,-57.400,6.229,2.020,4107.800,47.685,-122.409,...,-51.747,3.896,-74.751,2.827,177.521,1.910,238.305,1.991,233.215,2019-11-16 11:17:43.846577512-08:00
41079,15.500,22.600,239.000,3.870,-52.900,6.114,2.020,4107.900,47.685,-122.409,...,-52.044,3.890,-75.411,2.828,177.484,1.908,238.938,1.995,233.849,2019-11-16 11:17:43.946397580-08:00
41080,15.200,22.700,239.600,3.870,-52.900,6.053,2.010,4108.000,47.685,-122.409,...,-52.341,3.885,-76.072,2.829,177.448,1.905,239.571,1.999,234.498,2019-11-16 11:17:44.046217648-08:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
111664,15.200,-24.700,134.200,5.350,18.800,-3.444,0.990,11166.400,47.684,-122.410,...,18.582,5.337,35.720,3.973,186.039,1.044,134.291,0.944,163.555,2019-11-16 13:15:22.328251076-08:00
111665,15.200,-24.800,133.700,5.350,20.700,-3.384,0.970,11166.500,47.684,-122.410,...,18.368,5.313,36.044,3.975,185.986,1.043,133.915,0.942,163.042,2019-11-16 13:15:22.428372444-08:00
111666,15.200,-25.000,133.500,5.350,21.400,-3.444,0.950,11166.600,47.684,-122.410,...,18.134,5.288,36.365,3.976,185.931,1.043,133.538,0.940,162.507,2019-11-16 13:15:22.528493812-08:00
111667,15.200,-25.200,133.000,5.350,17.400,-3.444,0.950,11166.700,47.684,-122.410,...,17.892,5.264,36.683,3.977,185.874,1.043,133.163,0.938,161.953,2019-11-16 13:15:22.628615180-08:00


In [9]:
# Display the full list of columns
df.columns

Index(['variation', 'rudder', 'rhdg', 'raws', 'rawa', 'turn_rate', 'rsog',
       'row_seconds', 'latitude', 'longitude', 'altitude',
       'geoidal_separation', 'zg100_pitch', 'zg100_roll', 'zeus_cog',
       'zeus_sog', 'zeus_altitude', 'zeus_gnss_type', 'rspd', 'depth', 'rtws',
       'rtwa', 'rtwd', 'rcog', 'timestamp', 'awa', 'aws', 'twa', 'tws', 'twd',
       'spd', 'hdg', 'sog', 'cog', 'row_times'],
      dtype='object')

In [10]:
# We'll store information about the meanings of these columns in a DataFrame!
column_df = pd.read_pickle(os.path.join(G.DATA_DIRECTORY, "column_info.pd"))

# And display in an edittable grid.  Be sure to scroll around.
grid = qgrid.show_grid(column_df, show_toolbar=True)
grid


QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

In [11]:
# If you do update the table shown above, then this will save the changes (which are not saved by default)

if False:
    new_df = w.get_changed_df()
    new_df.to_pickle(os.path.join(G.DATA_DIRECTORY, "column_info.pd"))

In [12]:
# As in the initial example, we can focus on the critical columns.
good_cols = "row_times latitude longitude awa aws hdg spd sog cog".split()
# Note, this split biz is just a way for me to quickly type a long list with out all the 
# punctuation.  Rather than ['a', 'b', 'c'] I type "a b c".split()
print(good_cols)
df[good_cols]

['row_times', 'latitude', 'longitude', 'awa', 'aws', 'hdg', 'spd', 'sog', 'cog']


Unnamed: 0,row_times,latitude,longitude,awa,aws,hdg,spd,sog,cog
41076,2019-11-16 11:17:43.646937376-08:00,47.685,-122.409,-51.144,3.907,237.041,1.912,1.983,232.006
41077,2019-11-16 11:17:43.746757444-08:00,47.685,-122.409,-51.449,3.902,237.673,1.911,1.987,232.600
41078,2019-11-16 11:17:43.846577512-08:00,47.685,-122.409,-51.747,3.896,238.305,1.910,1.991,233.215
41079,2019-11-16 11:17:43.946397580-08:00,47.685,-122.409,-52.044,3.890,238.938,1.908,1.995,233.849
41080,2019-11-16 11:17:44.046217648-08:00,47.685,-122.409,-52.341,3.885,239.571,1.905,1.999,234.498
...,...,...,...,...,...,...,...,...,...
111664,2019-11-16 13:15:22.328251076-08:00,47.684,-122.410,18.582,5.337,134.291,1.044,0.944,163.555
111665,2019-11-16 13:15:22.428372444-08:00,47.684,-122.410,18.368,5.313,133.915,1.043,0.942,163.042
111666,2019-11-16 13:15:22.528493812-08:00,47.684,-122.410,18.134,5.288,133.538,1.043,0.940,162.507
111667,2019-11-16 13:15:22.628615180-08:00,47.684,-122.410,17.892,5.264,133.163,1.043,0.938,161.953


In [13]:
# We can graph values versus time

# Recall that distance is stored in METERS (and METERS PER SECOND).
plt.figure()
df.spd.plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1c376eba20>

In [14]:
# Or we can plot quantities and compare them

c.quick_plot(df.index, (df.spd, df.aws), ["spd", "aws"])

<IPython.core.display.Javascript object>

# Conclusions

In 2020, we have many more automated tools than Arvel Gentry did in 1974.  Our goals remain the same.  Understand conditions and learn how to sail better.