# Challenge 2 - Getting started <img align="right" src="../Supplementary_data/EY_logo.png" style="margin:0px 50px">

Welcome to the 2021 Better Working World Data Challenge! 

Prior to running this notebook, make sure you have:
* **Registered** for "Challenge 2: Fire behavior" on the [EY Data Science Platform](https://datascience.ey.com/).
* **Completed** "Challenge 1: Fire mapping" to a reasonable accuracy before starting on Challenge 2. You will use your process/model from Challenge 1 to create more annotated data for Challenge 2.


### Context 

Airborne infrared linescan images are currently considered one of the best sources of information about fire intensity and location. However, there are times when it is not possible to acquire infrared linescan imagery, for example due to resource contraints or unsafe conditions. An alternative source of images for fire mapping is via satellite. The availability and resolution of satellite imagery has increased substantially in recent years, making it possible to monitor bushfires from space. Satellite imagery is not always available, however it is a valuable complement to other information sources.

While the number of satellite passes per day is continuing to increase, especially with commercial operators expanding their service offerings, there are still long periods when there is no coverage. For times when neither linescan nor satellite data are available, it is possible to extrapolate from previous observations to forecast the current location of the fire. It can also be useful for firefighting teams to forecast future locations of fire based on current observations.


### Your task

<img src="resources/animated_timeseries.gif" align="right" width=300px style="margin:0px 40px 40px 40px">

The training dataset you worked with in Challenge 1 contains 129 linescan images, plus an additional 5 linescan images that were used for testing. These images were captured over seven different fire events. For each of the fire events, a narrative sequence of images can be produced from a combination of linescan and satellite images. In two of the seven fire events, linescan images have been withheld at key time points. Your task is to produce a map of the fire at those time points.

Similar to Challenge 1, the `test.csv` file contains the details of the pixels that you must forecast using the narrative series of each fire event.

All linescan and satellite images are served via the Open Data Cube python library.

 To forecast the fire spread over time, you may need to use information such as:
- terrain, vegetation type and vegetation condition prior to the fire (available from satellite images), for example the NDVI product as a proxy for fuel loading or land use type (see resources in the 02_Real_world_examples folder)
- linescan and satellite images taken during the fire

Note that the timeseries animation shown here was created in the [Linescan loading examples](../04_EY_challenge2/Linescan_loading_examples.ipynb) notebook.

In [1]:
%matplotlib inline
import sys
import numpy as np
import pandas as pd
import geopandas as gpd

from odc.ui import show_datasets
from datacube import Datacube
from datacube.testutils.io import native_geobox

import ipyleaflet as L

  shapely_geos_version, geos_capi_version_string


In [2]:
dc = Datacube()

In [3]:
linescan_datasets = dc.find_datasets(product='linescan')
linescan_datasets = sorted(linescan_datasets, key = lambda ds: (ds.center_time, ds.id))

In [4]:
vector_file = '../03_EY_challenge1/resources/fire_boundaries.shp'
gdf = gpd.read_file(vector_file)
# gdf.head(1).T

## Exploring fire events

The `train.csv` file lists all the linescans that are available, including an "event" column showing which fire event they are part of.

In [5]:
train = pd.read_csv('resources/challenge2_train.csv')
train.head(3)

Unnamed: 0,id,label,dateTimeLocal,dateTimeUTC,event
0,0,ROSEDALE_P1_201901041439_MGA94_55,4/01/2019 14:39,4/01/2019 3:39,Rosedale
1,1,ROSEDALE_1_P1_201901041446_MGA94_55,4/01/2019 14:46,4/01/2019 3:46,Rosedale
2,2,ROSEDALE_3_P1_201901041501_MGA94_55,4/01/2019 15:01,4/01/2019 4:01,Rosedale


Note that two linescans are relevant for multiple fire events that were occurring simultaneously, Macalister91 & Macalister97. There are also some additional linescans marked "other", that are not part of the seven main fire events.

In [6]:
train.event.value_counts()

Macalister91                   30
Walhalla                       24
Macalister97                   23
Tambo76                        20
Latrobe86                      15
Other                          10
Yarra51                         7
Rosedale                        3
Macalister91 & Macalister97     2
Name: event, dtype: int64

Using this file, we can group the linescans into discreet events. First, we'll join the event-to-linescan mapping from `train.csv` onto our list of which linescan datasets that are available, by creating a new "event" property in the list.

In [7]:
for ls in linescan_datasets:
    ls.event = train.loc[train.label==ls.metadata_doc["label"], 'event'].values[0]

Now let's explore just the Yarra51 event. The cell below uses list comprehension to return a subset of the `linescan_dataset` list, and the following cell prints the "label" property of each linecan in the subset.

In [8]:
Yarra51_ls = [ls for ls in linescan_datasets if ls.event == 'Yarra51']

In [9]:
for ls in Yarra51_ls:
    print(ls.metadata_doc['label'])

YARRA51_620_P1_201903051812_MGA94_55
YARRA51_622_P1_201903051841_MGA94_55
YARRA51_633_P1_201903061644_MGA94_55
YARRA51_704_P1_201903091659_MGA94_55
YARRA51_726_P1_201903100129_MGA94_55
YARRA51_794_P1_201903151412_MGA94_55
YARRA51_809_P1_201903161558_MGA94_55


We'll also filter the polygon dataset to get a quick look at what ground truth annotations are available for this fire event. You are encouraged to use your solution from Challenge 1 to fill in the blanks where there is no ground truth annotation available. From this list you can see that there are no polygons provided for "YARRA51_622_P1_201903051841_MGA94_55".

In [10]:
Yarra51_gdf = gdf.loc[gdf.event == 'Yarra51']
Yarra51_gdf.SourceName.unique()

array(['yarra51 620 p1_201903051812_mga94_55.jpg',
       'yarra51 633 p1_201903061644_mga94_55.jpg',
       'yarra51 704 p1_201903091659_mga94_55.jpg',
       'yarra51 726 p1_201903100129_mga94_55.jpg',
       'yarra51 794 p1_201903151412_mga94_55.jpg',
       'ObservationsAreaEditing_20190312_1700',
       'yarra51 809 p1_201903161558_mga94_55.jpg'], dtype=object)

Let's see what this looks like on a map, alongside the available polygons. Note we will also need to change the CRS of the polygons to epsg:4326 to display them on the map.

In [93]:
m = show_datasets(Yarra51_ls)

Yarra51_gdf.geometry = Yarra51_gdf.geometry.to_crs('epsg:4326')

layer_gdf = L.GeoData(geo_dataframe=Yarra51_gdf, name = 'polygons',
                      style={'color': 'black', 'fillColor': '#3366cc', 'opacity':0.05,
                             'weight':1.9, 'fillOpacity':0.6}
                     )

m.add_layer(layer = layer_gdf)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)


In [56]:
m

Map(center=[-37.58633193103924, 145.88252890518146], controls=(ZoomControl(options=['position', 'zoom_in_text'…

In [92]:
# for ls in Yarra51_ls:
#     ng = native_geobox(ls)
#     ls_data = dc.load(product='linescan',
#                       id=ls.metadata_doc['id'],
#                       output_crs = ls.metadata_doc['crs'],
#                       resolution = ng.resolution)
#     ls_data['linescan'].squeeze().plot.imshow(cmap='inferno', robust=False, size=8)

Using the functions in the [Linescan loading examples](../04_EY_challenge2/Linescan_loading_examples.ipynb) notebook you can also find satellite data relevant to the fire event. This could be used to supplement time periods between linescans, or to provide information about the conditions of the fire area prior to the fire. You can also use the functions in that notebook to find a common resolution and extent between linescans, and to create animations of the event.

### Forecast fire progression

Your task is now to forecast the progression of each fire event. You could develop a method using one fire or set of training images, and then validate it using other fire events/sets of training images.

For example, in a basic solution you could find the edge of the fire front at a given time, observe how far it travels between two adjacent images, and then adjust this distance based on the time elapsed to a third image. You could also adjust the speed based on local ground conditions such as vegetation and land use type.

For a more sophisticated solution you could implement a neural network to do next-frame prediction for each 'frame' of the fire event, based on a string of previous frames.

### Making a submission
The `test.csv` file provides a list of 5000 coordinates that require classification at five additional fire observations where linescans have not been provided. Note that the coordinates are denoted in the CRS mentioned above, epsg:28355. Follow the same process described in the Challenge 1 example notebook to create a submission.

Of the five linescans which the test coordinate pairs have been selected from, three are from the Tambo76 event and two are from the Rosedale event. In both cases, some training images from the start of the fire event, prior to the linescans selected for testing, have been provided: 3 from the Rosedale event and 20 from the Tambo76 event.

In [62]:
test = pd.read_csv('resources/challenge2_test.csv')
test.head(3)

Unnamed: 0,id,event,x,y,dateTimeLocal,dateTimeUTC,target
0,0,Rosedale,491391,5769660,4/01/2019 15:40,4/01/2019 4:40,
1,1,Rosedale,486132,5764884,4/01/2019 15:40,4/01/2019 4:40,
2,2,Rosedale,484371,5776757,4/01/2019 15:40,4/01/2019 4:40,


In [67]:
test.event.value_counts()/1000

Tambo76     3.0
Rosedale    2.0
Name: event, dtype: float64

***
## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please review the FAQ section and support options on the [EY Data Science Platform](https://datascience.ey.com/).