# Planet data dump

**Goal: download all available clear-sky PlanetScope 3m four band analytic surface reflectance imagery for every NEON site from 2016 until the present (September 2020) **

## Pre-requisites

Install geospatial dependencies for `.geojson`, `.tif`, Google Earth Engine (GEE), and `snakemake` workflow manager.

## Directory Structure

To organize the project, we're going to put our files in a set of folders, these will use `.gitignore` to avoid adding them to the GitHub repository.

```
/NEON_Downloads
├──/aop_sites
│   ├──
│   ├──
│   └──
├──/planet_order
│   ├──
│   ├──
│   ├──
│   └──
├──/porder_snakemake
│
└──
```
### NEON site boundaries

From NEON, https://www.neonscience.org/data/about-data/spatial-data-maps, I've converted the `.shp` files to `.geojson` for the Terrestrial Sampling Boundaries, AOP Flight Areas, and TOS Sampling locations.

In [None]:
# set present working directory


In [2]:
import geopandas
site = geopandas.read_file('terrestrial_boundaries.geojson')
aop = geopandas.read_file('AOP_flightboxes.geojson')
tos = geopandas.read_file('TOS_v4.geojson')

Read out the contents of the files just to make sure that their fields are complete

In [3]:
print(aop.head())

  domain domainName                      siteName siteID     siteType  \
0    D01  Northeast  Bartlett Experimental Forest   BART  Relocatable   
1    D01  Northeast                Harvard Forest   HARV         Core   
2    D01  Northeast                Harvard Forest   HARV         Core   
3    D01  Northeast               Lower Hop Brook   HOPB         Core   
4    D19      Taiga                         Healy   HEAL  Relocatable   

    sampleType  priority  version         flightbxID  \
0  Terrestrial         1        1  D01_BART_R1_P1_v1   
1  Terrestrial         1        1  D01_HARV_C1_P1_v1   
2  Terrestrial         3        1  D01_HARV_C1_P3_v1   
3      Aquatic         2        1  D01_HOPB_C1_P2_v1   
4  Terrestrial         1        1  D19_HEAL_R3_P1_v1   

                                            geometry  
0  MULTIPOLYGON (((-71.33426 43.99197, -71.33423 ...  
1  MULTIPOLYGON (((-72.14819 42.57510, -72.14776 ...  
2  MULTIPOLYGON (((-72.10812 42.43653, -72.14788 ...  
3  M

In [4]:
print(site.head())

  domainNumb    domainName                 siteType  \
0        D01     Northeast         Core Terrestrial   
1        D02  Mid-Atlantic  Relocatable Terrestrial   
2        D02  Mid-Atlantic  Relocatable Terrestrial   
3        D03     Southeast  Relocatable Terrestrial   
4        D03     Southeast         Core Terrestrial   

                                    siteName siteID  \
0                             Harvard Forest   HARV   
1                   Blandy Experimental Farm   BLAN   
2  Smithsonian Environmental Research Center   SERC   
3                 Disney Wilderness Preserve   DSNY   
4          Ordway Swisher Biological Station   OSBS   

                           siteHost    areaKm2         acres  \
0          Harvard University, LTER  11.737025   2900.270496   
1            University of Virginia   2.694233    665.756840   
2           Smithsonian Institution   1.578849    390.140625   
3            The Nature Conservancy  48.504342  11985.635953   
4  University of F

In [5]:
print(tos.head())

        country state   county     domain domanID  \
0  unitedStates    NH  Carroll  Northeast     D01   
1  unitedStates    NH  Carroll  Northeast     D01   
2  unitedStates    NH  Carroll  Northeast     D01   
3  unitedStates    NH  Carroll  Northeast     D01   
4  unitedStates    NH  Carroll  Northeast     D01   

                        siteNam siteID      plotTyp   subtype subSpec  ...  \
0  Bartlett Experimental Forest   BART  distributed  basePlot    None  ...   
1  Bartlett Experimental Forest   BART  distributed  basePlot    None  ...   
2  Bartlett Experimental Forest   BART  distributed  basePlot    None  ...   
3  Bartlett Experimental Forest   BART  distributed  basePlot    None  ...   
4  Bartlett Experimental Forest   BART  distributed  basePlot    None  ...   

           nlcdCls  slTypOr     cordSrc        date  fltrdPs  plotPdp plotHdp  \
0  deciduousForest     None  GeoXH 6000  20140902.0    301.0      5.8     3.0   
1      mixedForest     None  GeoXH 6000  20131015.

We need to clip the geometry of each NEON site, using its `flightbxID` and save that geojson as a unique `flightbID.geojson`  

In [6]:
print(aop["flightbxID"][98]+'.geojson')

D14_SYCA_A1_P1_v2.geojson


### Create individual `.geojson` for each Flight Box.
This script splits each site by flightbox and priority area and saves them in a new folder called `aop_sites/`

These low point count polygons are useful for querying the Planet API, and will make downloading the Planet assets (images) easier. 

In [12]:
import copy
import json
import os

# Import AOP flightboxes filename
JSON_FILENAME = 'AOP_flightboxes.geojson'

# Create directory
dirName = 'aop_sites'
try:
    # Create target Directory
    os.mkdir(dirName)
    print("Directory " , dirName ,  " Created ") 
except FileExistsError:
    print("Directory " , dirName ,  " already exists")

# Split AOP flightboxes GeoJSON into individual files    
def _get_sites_data(features: list) -> dict:
    ret_list = {}
    for feature in features:
        site_id = feature['properties']['flightbxID']
        if site_id not in ret_list:
            ret_list[site_id] = [feature]
        else:
            ret_list[site_id].append(feature)
    return ret_list

with open(JSON_FILENAME, 'r') as in_file:
    geo_data = json.load(in_file)

json_preamble = {}
for key, value in geo_data.items():
    if key not in ['features']:
        json_preamble[key] = value

site_list = _get_sites_data(geo_data['features'])

for site, site_features in site_list.items():
    out_filename = 'aop_sites/'+ site + '.geojson'
    site_json = copy.deepcopy(json_preamble)
    site_json['features'] = site_features
    with open(out_filename, 'w') as out_file:
        json.dump(site_json, out_file)

Directory  aop_sites  already exists


The `porder` json reader does not like the added hierarchy of four `[[[[` and `]]]]` in the new site `.geojson` for the geometry, so we need to remove one of the square brackets. 
Here I use a simple `sed` command to recursively go through every `.geojson` in the directory and modify the four brackets down to three. I also change the definition from a `MultiPolygon` down to a simple `Polygon`

In [13]:
%%bash
cd aop_sites && sed -i 's/\]\]\]\]/\]\]\]/g' * && sed -i 's/\]\]\]\]/\]\]\]/g' * && sed -i 's/MultiPolygon/Polygon/g' *  

Try clicking on each `.geojson` to see if it displays in the Jupyter Lab.

to do:  build a script (maybe with a workflow manager) that will take each NEON site and clip the Planet imagery within the respective AOP extent.

Estimate total `sq|km` of Planet assets across `n` days over every NEON AOP boundary using `porder`



# (To do) Snakemake workflow 

Use [SnakeMake](https://snakemake.readthedocs.io/en/stable/index.html) to create a workflow which downloads the Planet data by site, and prepares it for ingestion into GEE.

```
conda install -c conda-forge mamba
mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal
```

`porder` uses BASH commands, and looks something like this: 

In [None]:
import sys
print(sys.executable)

# Ordering Planet assets with `porder` from CLI

Samapriya's `porder` hosted on [Github](https://github.com/tyson-swetnam/porder) has [detailed instructions](https://tyson-swetnam.github.io/porder/) for downloading data from Planet

### Log into Planet

I have an account with [Planet.com](https://planet.com), in the terminal I run `planet init` and enter my email address and password. This creates a hidden file in my home folder  `~/.planet.json`

**Check Quota**

Before you start work on this notebook check your quota in Sq/km to see if you can download your imagery.

In [1]:
import planet
planet_quota()

NameError: name 'planet_quota' is not defined

In [1]:
%%bash
porder idlist --input "/workspace/NEON_Downloads/aop_sites/D16_WREF_C1_P1_v2.geojson" --start "2017-01-01" --end "2020-08-31" --item "PSScene4Band" --asset "analytic_sr,udm2" --outfile "/workspace/NEON_Downloads/planet_order/D16_WREF_C1_P1_v2_2016_2020.csv" --filters range:clear_percent:90:100 --number 1000



Running search for a maximum of: 1000 assets


ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
Traceback (most recent call last):
  File "/opt/conda/bin/porder", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/porder/porder.py", line 841, in main
    func(args)
  File "/opt/conda/lib/python3.7/site-packages/porder/porder.py", line 284, in idlist_from_parser
    filters=args.filters,
  File "/opt/conda/lib/python3.7/site-packages/porder/geojson2id.py", line 253, in idl
    res = client.quick_search(req)
  File "/opt/conda/lib/python3.7/site-packages/planet/api/client.py", line 168, in quick_search
    body_type=models.Items, data=body, method='POST')).get_body()
  File "/opt/conda/lib/python3.7/site-packages/planet/api/models.py", line 45, in get_body
    resp = self._dispatcher._dispatch(self.request)
  File "/opt/conda/lib/python3.7/site-packages/planet/api/dispatch.py", line 155, in _dispatch
    return _do_request(self.session, request)
  File "/opt/conda/lib

CalledProcessError: Command 'b'porder idlist --input "/workspace/NEON_Downloads/aop_sites/D16_WREF_C1_P1_v2.geojson" --start "2017-01-01" --end "2020-08-31" --item "PSScene4Band" --asset "analytic_sr,udm2" --outfile "/workspace/NEON_Downloads/planet_order/D16_WREF_C1_P1_v2_2016_2020.csv" --filters range:clear_percent:90:100 --number 1000\n'' returned non-zero exit status 1.