---
# Checking SkyTruth data


### This notebook is about checking the reliability of the SkyTruth dataset regarding gas flaring. In this notebook we will:

+ Spatially filter a subset of the SkyTruth coordinate locations for an arbitrary region (North Dakota in this case)
+ Use the Planet API to grab imagery from this subset
+ Explore those images to see how accurate the coordinates were.
    + Find outlier imagery

### Next, we will
+ Training a CNN to identify well pads in North Dakota Bakken formation region

---


#### Download the dataset

In [79]:
DATAURL='https://storage.cloud.google.com/viirs.skytruth.org/viirs/data/csv/viirs-prerun21-clustered-0.5NM-30days-3dtc.zip'
DATAPATH='data/SkyTruthDataset.csv'

# Check that the file is there
!du -h $DATAPATH

114M	data/SkyTruthDataset.csv


#### Read the data into a pandas dataframe

In [80]:
import pandas as pd
fields = ['datetime', 'longitude', 'latitude']
raw_data = pd.read_csv(DATAPATH, skipinitialspace=True, usecols=fields)

# Check the first few rows
raw_data.head()

Unnamed: 0,datetime,longitude,latitude
0,2012-03-04 00:52:23+00:00,5.965468,31.695882
1,2012-03-04 00:52:30+00:00,20.960135,28.901543
2,2012-03-04 00:53:13+00:00,7.658133,28.576355
3,2012-03-04 04:30:19+00:00,-40.682323,-22.912826
4,2012-03-04 10:50:24+00:00,-148.522641,70.327999


---
## Identify an area of interest (AOI)
The SkyTruth dataset includes gas flaring locations from all around the world. We're going to focus on North Dakota where most of the Bakken formation lies. This is an area with a lot of gas flaring because there is insufficient pipeline infrastructure to capture that gas. That means most of it gets burned, so we should expect a significant amount of data points in that area.

We use the browser-based tool __[geojson.io](http://geojson.io/#id=gist:tjdahlke/665ab9d496645cb409f953f67ac006d2&map=7/47.521/-100.283)__
to find the coordinates we're interested in. Click __[here](http://geojson.io/#id=gist:tjdahlke/665ab9d496645cb409f953f67ac006d2&map=7/47.521/-100.283)__ to use the tool to define your own area of interest.


In [39]:
import json

# Load GeoJSON file containing AOI geometries
with open('data/map.geojson') as f:
    AOI = json.load(f)
print(AOI)

{'type': 'FeatureCollection', 'features': [{'type': 'Feature', 'properties': {}, 'geometry': {'type': 'Polygon', 'coordinates': [[[-97.1630859375, 49.05227025601607], [-104.12841796875, 49.03786794532644], [-104.1064453125, 45.93587062119052], [-96.43798828125, 45.9511496866914], [-97.1630859375, 49.05227025601607]]]}}]}


---
Now we check each of the points in the SkyTruth dataset to filter out which ones lie within the area of interest (AOI) that we previously defined. North Dakota is a very simple geometry, but this method is more flexible and allows us to scan over multiple selected regions. This might take a few minutes to scan over all the data points.

In [81]:
from shapely.geometry import shape, Point

# Initialize empty collection list
AOIpoints=[]

# Loop over the data and grab rows that correspond to points in our AOI
for index, row in raw_data.iterrows():
    # Grab the lat/long 
    lat=row['latitude']
    long=row['longitude']
    time=row['datetime']
    point = Point(long, lat)
            
    # Check each Polygon in AOI to see if it contains the point
    for feature in AOI['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point):
            newrow=[time,lat,long]
            AOIpoints.append(newrow)
            print(len(AOIpoints))
    if (len(AOIpoints)>100):
        break
            
# Convert list to a new dataframe
nd_data = pd.DataFrame(np.array(AOIpoints).reshape(len(AOIpoints),3))
nd_data.columns = raw_data.keys()
print(nd_data)






        
        

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
                             datetime           longitude             latitude
0           2012-03-08 07:59:10+00:00   48.06868522222221  -103.03654555555599
1           2012-03-08 07:59:16+00:00    47.8478023333333    -103.399126666667
2           2012-03-08 08:49:27+00:00    47.9915133333333          -102.940086
3    2012-03-08 08:49:31.500000+00:00  47.704601111111096    -102.816031777778
4           2012-03-08 09:39:37+00:00           48.193891          -102.612302
5           2012-03-08 09:39:47+00:00    47.8226316666667    -103.316449666667
6           2012-03-08 09:39:47+00:00  47.586389000000004  -102.65763100000001
7           2012-03-09 07:40:09+00:00    48.0222701666667    -102.908998

---
## Image selection and retrieval
Now that we have spatially filtered out the right coordinates, we can use the Planet API to select the right imagery to retrieve. Ideally, we want to get a picture of each coordinate location at (or soon after) the time that the coordinate was identified as a gas flaring location.