<br>
# MAIN REPOS
---
<p>
- [Global Fishing Watch Main Github Repo](https://github.com/GlobalFishingWatch)
- [Global Fishing Watch Data Blog](http://globalfishingwatch.io/)
- [DataKind SG Github](https://github.com/DataKind-SG/vessel-scoring)
- [GFW Beta Map](http://globalfishingwatch.org/map/)

<p>
- [Data Repo](https://github.com/GlobalFishingWatch/training-data) (Too large to download! Get thumbdrive from team leads)
- [Vessel Model Repo](https://github.com/GlobalFishingWatch/vessel-scoring) (Clone this pls).

- pip install rolling_measures (https://github.com/GlobalFishingWatch/rolling_measures)

<br>
# [Vessel Identity](http://globalfishingwatch.io/vessels.html)
---
<p>
- AIS messages include a field **"shiptype"**, which is a two digit number corresponding to the vessel’s activity. 
- The full list of these possible activities is listed on [Marine Traffic](https://help.marinetraffic.com/hc/en-us/articles/205579997-What-is-the-significance-of-the-AIS-SHIPTYPE-number-).

#### 1. Likely Fishing
They call vessels that self report as fishing, **"likely fishing"** vessels.

#### 2. Known Fishing
To identify fishing vessels we also match mmsi numbers to vessel registries, such as the [European Union’s](http://ec.europa.eu/fisheries/fleet/index.cfm) vessel registry, or the [Consolidated List of Authorized Vessels](http://www.tuna-org.org/vesselpos.htm). Many of these vessels also self-report as fishing and we call these vessels **"known fishing"** vessels.


<br>
They have collected a list of known vessel lists [here](https://github.com/GlobalFishingWatch/treniformis/tree/0.3/treniformis/_assets/GFW/FISHING_MMSI/KNOWN_AND_LIKELY).
The list is collected based on vessels with the following criteria:
- At least 500 position messages broadcast in any given year (thus ignoring vessels that had very little activity) – positions where a vessel had a speed above 0.1 knots.
- Vessels that broadcast they were fishing vessels 99 percent of the time (likely fishing vessels), or were matched with one of the fishing vessel registries (known fishing vessels).
- Some mmsi numbers that they knew were not fishing vessels, such as some helicopters that were using AIS and self-reporting as fishing because they work with fishing vessels.

A blog post about the list can be found [here](http://globalfishingwatch.io/vessel_activity/2016/12/22/New-Vessel-Lists.html).

#### 3. Suspected Fishing
When vessels exhibiting fishing behavior(by our prediction) and are not also listed in registries or do not self-report, we call these **"suspected fishing"** vessels.
- They have initially used a [Logistic Regression Model](https://github.com/GlobalFishingWatch/vessel-scoring).
- They have now used a [Neural Net model (CNN)](https://github.com/GlobalFishingWatch/vessel-classification/tree/master/classification).


### Longliners
<a href="http://tuna.greenpeace.org/assets/uploads/MK1A0129-cover.jpg"><img src = "http://tuna.greenpeace.org/assets/uploads/MK1A0129-cover.jpg" width="400" align="left" display="block" ></a>
<a href="http://www.greenpeace.org/international/community_images/84/2284/90963_146929.jpg"><img src = "http://www.greenpeace.org/international/community_images/84/2284/90963_146929.jpg" width="400" align="left"></a>

### Trawlers
<a href="http://www.animalsaustralia.org/images/features/1000-trawler-greenpeace.jpg?1"><img src="http://www.animalsaustralia.org/images/features/1000-trawler-greenpeace.jpg?1" width="400" align="left" ></a>
<a href="https://storage.googleapis.com/gpuk-static/legacy/beam_trawler_graphic_430.jpg"><img src="https://storage.googleapis.com/gpuk-static/legacy/beam_trawler_graphic_430.jpg" width="400" align="left" ></a>

### Purse Seiners

<a href="http://www.alaskafishingjobsnetwork.com/wp-content/uploads/2008/12/purseseineralaska300.jpg"><img src="http://www.alaskafishingjobsnetwork.com/wp-content/uploads/2008/12/purseseineralaska300.jpg" width="400" align="left" ></a>

<a href="https://greenfishbluefish.files.wordpress.com/2013/03/purse-seine.jpg">
<img src="https://greenfishbluefish.files.wordpress.com/2013/03/purse-seine.jpg" width="500" align="left" ></a>

### Others
<hr>

<a href="https://greenfishbluefish.files.wordpress.com/2013/03/pole-and-line.jpg"><img src="https://greenfishbluefish.files.wordpress.com/2013/03/pole-and-line.jpg" width="400" align="left" ></a>
<a href="https://greenfishbluefish.files.wordpress.com/2013/03/tuna-troll.jpg">
<img src="https://greenfishbluefish.files.wordpress.com/2013/03/tuna-troll.jpg" width="500" align="left" ></a>

<br>
# [Architecture](http://globalfishingwatch.io/architecture.html)
---

<img src = "./images/Architecture-overview.svg">

0. [AIS Data Feed] : Contains positional & indentity information for vessels. Voluntarily info might be missing/incorrect.
1. [Vessel Registry Matching](https://github.com/GlobalFishingWatch/identity-matching): Try to match the MMSI (station identity in AIS) with vessel identities in fishing registries.
2. [Vessel Classification Crowd Sourcing](https://github.com/GlobalFishingWatch/pybossa-vessel-identification)
3. [Vessel List](https://github.com/GlobalFishingWatch/treniformis): Maintained list of Vessel IDs for modeling.
4. [Hand-Labeled AIS Data](https://github.com/GlobalFishingWatch/training-data)
5. [Distance Rasters](https://github.com/GlobalFishingWatch/ancillary-gis-data)
6. **[Fishing Detection Algorithm](https://github.com/GlobalFishingWatch/vessel-scoring/)**: Regression model trained on a hand-labeled dataset of tracks
7. [Vessel Classification Algorithm](https://github.com/GlobalFishingWatch/vessel-classification)
8. [Clustering Tileset Generation](https://github.com/SkyTruth/benthos-pipeline): Visualize the tracks with their fishing detections by generating a tileset
9. [API server & search](https://github.com/GlobalFishingWatch/pelagos-api)
10. [Visualization Front End](https://github.com/GlobalFishingWatch/pelagos-client)
11. [Fishing Raster]

<br>
# [Fishing Detection Models](https://github.com/GlobalFishingWatch/vessel-scoring)
---
The models computes the probability that a vessel is fishing based on its AIS track data. 
The combined fishing score of all vessels is used to estimate the [fishing activity worldwide](http://globalfishingwatch.io/effort.html).


### Definition of Fishing
- The period when a vessel has fishing gear in the water.
- The time that a vessel spends away from shore in which it is not transiting to and from the fishing grounds.

For trawlers and longliners, these two definitions give similar results, but the same is not true for purse seiners.


### Data Details
The models are trained using data that has been hand labeled as fishing or non-fishing by **Kristina Boerder** at Dalhousie University. Dalhousie training data consists of hand-classified AIS data for **29 unique vessels** with complete tracks classified over long periods. These vessels are divided between the **different gear types** as shown in the table below. There are also **118 longliner vessels with GFW analyst classification**, where shorter track segments are classified for each vessel.

In addition, data from two vessels performing slow transits is added to the training data to help the model learn to avoid classifying these transits as fishing.

|              |Dalhousie Vessels|Dalhousie Data Pts| GFW Analyst Vessesl| GFW Analyst Data Pts|
|------------  |-----------------|------------------|--------------------|---------------------|
|Longliner     |16               |569,504           |118                 |324,166              |
|Trawler       |6                |828,162           |                    |                     |
|Purse Seine   |7                |398,897           |                    |                     |
|Slow Transits |                 |                  |2                   |9,038                |

<br>
### 1. [Heuristic Model 1.0 ](http://globalfishingwatch.io/fishing__heuristic_1_0.html)
The first model developed by observing that there were correlations between fishing behavior and several of the values present in AIS messages. In particular, the likelihood that a vessel was fishing tends to <u>increase with the standard deviation of the speed and course</u> but <u>decrease with mean speed</u>. These features, calculated over a one hour window, were used to develop the heursitic model.
Model performs reasonably well for <u>trawlers and longliners</u> but poorly for <u>purse seiners</u>.

$$
fishing\_score = \frac{2}{3}\left(\sigma_{s_m} + \sigma_{c_m} + \overline{s_m}\right) 
$$

$$
\begin{align}
s_m & \equiv 1.0 - \min\left(1, speed\,/\,17\right)\\
c_m & \equiv course\,/\,360 \\
\sigma_x & \equiv \text{standard deviation of } x \\
\overline{x} & \equiv \text{mean of } x
\end{align}
$$

For the *heuristic model*, the means and standard deviations are computed over a one hour window.

Implementation: https://github.com/GlobalFishingWatch/vessel-scoring/blob/release-1.0/vessel_scoring/legacy_heuristic_model.py

Normalization and calculation: 
    https://github.com/GlobalFishingWatch/vessel-scoring/blob/release-1.0/vessel_scoring/add_measures.py
    
### 2. [Logistic Regression Model 1.1 (multi-window model)](http://globalfishingwatch.io/fishing__logistic_1_1.html)
Logistic regression models using the same features as the heuristic model, and trained using a hand labeled dataset.

* **Generic Model**
    <p>The first of the logistic models, referred to as the generic model, is the model currently in use and is a logistic model using a 12 hour time window and a feature order of 6. One model is trained for all gear types.</p><br>

* **Multi-Window Model**
    <p>The multi-window model is a logistic model similar to the generic model except that it uses multiple time windows, ranging in duration from one-half to twenty four hours. Using multiple window sizes both provides a richer feature set and avoids the needs to optimize over window size. Whether it is currently daylight appears to be a very useful feature for predicting purse seine fishing.</p><br>

* **Multi-Window, Gear-Type-Specific Model**
    <p>The multi-window gear-type-specific model, which is on the verge of being deployed, are a set of models, each the same as the Multi-Window model, but each trained on only vessels with a specific gear type. We have currently trained the model for longliners, trawlers and purse seiners.</p><br>

### 3. [Neural Net Model 1.0](https://github.com/GlobalFishingWatch/vessel-classification)
They have three CNN in production, as well as several experimental nets. One net predict vessel class (longliner, cargo, sailing, etc), the second predicts vessel length, while the third predicts whether a vessel is fishing or not at a given time point.


# File Descriptions in Training-Data/Data Folder

* **"mmsis.csv"** : list of ships with mmsis as id, their dimensions, data-source and whether their data have time-ranges.
    * Column Names: mmsi, is_fishing, label, sublabel, length, tonnage, list_sources, with_ranges
    
    
* **"/time-ranges/*.csv"** : files containing ships with periods of time stamps and their corresponding "is_fishing" class. Files are grouped by source and gear-type
    * Column Names: mmsi, start_time, end_time, is_fishing
    
    
* **"/tracks/*.csv"** : each numpy file contains individual ships with periods of time stamps, dist from shore/port and their lat lon. File is named with their mmsi number.
    * Column Names: mmsi, timestamp, distance_from_shore, distance_from_port, speed, course, lat, lon


<br>
### Key Field Classes
__label__: Cargo/Tanker, Drifting longlines, Fixed gear, Passenger, Pole and line, Purse seines, Reefer, Seismic vessel, Squid, Trawlers, Trowllers, Tug/Pilor/Supply, "blanks"

<br>
# Data Prep
---

1. Run "prepare.sh" in the "training-data" folder.

    1. calls **"merge_ais_and_ranges.py"**
        * merge all compressed numpy files from /time-ranges and /tracks and store them in /merged based on data-source (similar naming with files in /time-ranges)
        * *colum names: mmsi, timestamp, distance_from_shore, distance_from_port, speed, course, lat, lon, is_fishing*
        * 
    2. calls **"split_by_class.py"**
        * split data from 1st operation by "label" class (gear type) and save them by source name appended with class name.
        * *colum names: mmsi, timestamp, distance_from_shore, distance_from_port, speed, course, lat, lon, is_fishing*
        * 
2. Run "prepare2.sh" in "vessel-scoring" folder (1st level).

   - calls **"scripts/add_measures.py"** -> **"vessel_scoring.utils.numpy_to_messages()"** -> **"vessel_scoring.add_measures.AddMeasures()"** -> **"vessel_scoring.utils.messages_to_numpy()"**
   - this creates 83 new features for kristina_trawl, kristina_ps, kristina_longliner, false_positives and classified-filtered
   - (refer below for full list of column names and some calculation methods)
   - 

3. Go to /vessel_scoring/data.py and change all "classification" to "is_fishing". 
4. ~~Go to /vessel_scoring/utils.py and change "classification" to "is_fishing.  
(download new utils.py from anonymous-training-data branch?)~~  
Got to [https://github.com/GlobalFishingWatch/vessel-scoring/tree/anonymous-training-data](https://github.com/GlobalFishingWatch/vessel-scoring/tree/anonymous-training-data) to download new file for:
    - vessel-scoring/notebooks/Models-Description.ipynb
    - vessel-scoring/vessel-scoring/data.py
    - vessel-scoring/vessel-scoring/utils.py

5. ~~Go to /vessel-scoring/notebooks/Model-Descriptions.ipynb and change "slow-transits.measures.npz" to "false_positives.measures.npz"~~  
Run the new Model_Descriptions.ipynb



<br>
<hr>
Peep in data by running:
```python
import numpy as np
x = np.load('datasets/**whichevernumpyfile**.npz')['x']b
x[0:2]
x[0].dtype
```

## `******.measures.npz column names`
---

#### Existing Fields
- 'mmsi',
- 'timestamp',
- 'distance_from_shore',
- 'distance_from_port',
- **'speed'**,
- **'course'**,
- 'lat',
- 'lon',
- 'is_fishing',

#### Derived Fields

Standard deviations are computed over 0.25, 0.5, 1, 3, 6, 12 and 24 hour windows.
<br>Equivalent to 900, 1800, 3600, 10800, 21600, 43200, 86400 seconds.

<br>
<u>**`# ----- base measures ----- #`**</u>
<br>
- 'measure_course' => course / 360.0
- 'measure_sin_course' => numpy.sin(numpy.radians(course)) / numpy.sqrt(2)
- 'measure_cos_course' => numpy.cos(numpy.radians(course)) / numpy.sqrt(2)
- 'measure_speed' => 1.0 - min(1.0, speed / 17.0)
- 'measure_distance_from_port' => min(1.0, distance_from_port / 30.0)
<br>
- 'measure_daylight' => check if it is daytime or dark based on day and latitude (https://en.wikipedia.org/wiki/Position_of_the_Sun)  

<u>not in dataset</u>:
* 'measure_heading' = > heading / 360.0
* 'measure_turn' => min(1.0, abs(turn) / 126.0)  

<br>
<u>**`# ----- entry counts in last x seconds ----- #`**</u>
<br>
Cumulative entry count for that time window of x seconds

- 'measure_count_10800',
- 'measure_count_1800',
- 'measure_count_21600',
- 'measure_count_3600',
- 'measure_count_43200',
- 'measure_count_86400',
- 'measure_count_900',

<br>
<u>**`# ----- Measure_Course in last x seconds ----- #`**</u>
<br>
Average measure_course for time window of x seconds

- 'measure_courseavg_10800',
- 'measure_courseavg_1800',
- 'measure_courseavg_21600',
- 'measure_courseavg_3600',
- 'measure_courseavg_43200',
- 'measure_courseavg_86400',
- 'measure_courseavg_900',

<br>
<u>**`# ----- Measure_Course SD in last x seconds ----- #`**</u>
<br>
Average measure_course std dev for time window of x seconds (with log)

- 'measure_coursestddev_10800',
- 'measure_coursestddev_10800_log',
- 'measure_coursestddev_1800',
- 'measure_coursestddev_1800_log',
- 'measure_coursestddev_21600',
- 'measure_coursestddev_21600_log',
- 'measure_coursestddev_3600',
- 'measure_coursestddev_3600_log',
- 'measure_coursestddev_43200',
- 'measure_coursestddev_43200_log',
- 'measure_coursestddev_86400',
- 'measure_coursestddev_86400_log',
- 'measure_coursestddev_900',
- 'measure_coursestddev_900_log',

<br>
<u>**`# ----- Daylight Boolean in last x seconds ----- #`**</u>
<br>
Average daylight bool for the last x seconds of time window

- 'measure_daylightavg_10800',
- 'measure_daylightavg_1800',
- 'measure_daylightavg_21600',
- 'measure_daylightavg_3600',
- 'measure_daylightavg_43200',
- 'measure_daylightavg_86400',
- 'measure_daylightavg_900',

<br>
<u>**`# ----- Average Positions and Lat Long in last x seconds ----- #`**</u>
<br>
Average Position Pts for the last x seconds of time window (pos = std of lat lon)

- 'measure_latavg_10800',
- 'measure_latavg_1800',
- 'measure_latavg_21600',
- 'measure_latavg_3600',
- 'measure_latavg_43200',
- 'measure_latavg_86400',
- 'measure_latavg_900',
- 'measure_lonavg_10800',
- 'measure_lonavg_1800',
- 'measure_lonavg_21600',
- 'measure_lonavg_3600',
- 'measure_lonavg_43200',
- 'measure_lonavg_86400',
- 'measure_lonavg_900',
- 'measure_pos_10800',
- 'measure_pos_1800',
- 'measure_pos_21600',
- 'measure_pos_3600',
- 'measure_pos_43200',
- 'measure_pos_86400',
- 'measure_pos_900',

<br>
<u>**`# ----- Average Speeds and Lat Long in last x seconds ----- #`**</u>
<br>
Average Speeds and Std for the last x seconds of time window (pos = std of lat lon)

- 'measure_speedavg_10800',
- 'measure_speedavg_1800',
- 'measure_speedavg_21600',
- 'measure_speedavg_3600',
- 'measure_speedavg_43200',
- 'measure_speedavg_86400',
- 'measure_speedavg_900',
- 'measure_speedstddev_10800',
- 'measure_speedstddev_10800_log',
- 'measure_speedstddev_1800',
- 'measure_speedstddev_1800_log'
- 'measure_speedstddev_21600',
- 'measure_speedstddev_21600_log',
- 'measure_speedstddev_3600',
- 'measure_speedstddev_3600_log',
- 'measure_speedstddev_43200',
- 'measure_speedstddev_43200_log',
- 'measure_speedstddev_86400',
- 'measure_speedstddev_86400_log',
- 'measure_speedstddev_900',
- 'measure_speedstddev_900_log',

(Refer to sample.csv for 1 ship's sample data)


In [1]:
import numpy as np
#x = np.load('../training-data/data/merged/alex_crowd_sourced_Drifting_longlines.npz')['x']
#x = np.load('datasets/kristina_longliner.measures.npz')['x']

In [2]:
x = np.load('../training-data/data/labeled/kristina_longliner_Unknown.measures.labels.npz')['x']

In [15]:
x[0]['mmsi']

12639560807591.0

In [16]:
len(x)

2554222

In [17]:
len(x[x['mmsi']==12639560807591.0])

27312

In [18]:
import pandas as pd

In [48]:
df = pd.DataFrame(x[x['mmsi']==12639560807591.0])

In [49]:
df.head()

Unnamed: 0,mmsi,timestamp,distance_from_shore,distance_from_port,speed,course,lat,lon,is_fishing
0,12639560000000.0,1327137000.0,232994.28125,311748.65625,8.2,230.5,14.865583,-26.853662,-1.0
1,12639560000000.0,1327137000.0,232994.28125,311748.65625,8.2,230.5,14.865583,-26.853662,-1.0
2,12639560000000.0,1327137000.0,233994.265625,312410.34375,7.3,238.399994,14.86387,-26.8568,-1.0
3,12639560000000.0,1327137000.0,233994.265625,312410.34375,7.3,238.399994,14.86387,-26.8568,-1.0
4,12639560000000.0,1327137000.0,233994.265625,312410.34375,6.8,238.899994,14.861551,-26.860649,-1.0


In [29]:
df['timestamp'].dtype

dtype('int32')

In [28]:
df['timestamp'] = df['timestamp'].astype(int)

In [43]:
df.sort_values(['timestamp'])[['timestamp','measure_count_900']]

Unnamed: 0,timestamp,measure_count_900
0,1327107704,2.0
1,1327107704,2.0
2,1327107805,4.0
3,1327107805,4.0
4,1327107934,6.0
5,1327107934,6.0
6,1327114481,2.0
7,1327114481,2.0
8,1327114541,4.0
9,1327114541,4.0


In [50]:
df.shape

(27312, 9)

In [51]:
df.to_csv('sampleWithoutMeasures.csv', index=False)

In [52]:
import datetime
print(
    datetime.datetime.fromtimestamp(
        int("1331563581")
    ).strftime('%Y-%m-%d %H:%M:%S')
)

2012-03-12 22:46:21
