### MultiPoint imagery search
This script is a variant of the single point imagery search script. 
The logic is precisely the same. The main difference is that each section has been extensively functionalized so that it can be run in chain for each AOI in a 'Main' function (see penultimate cell)

### Library installation and script setup
This box needs to only be run once. It builds the environment to carry out the rest of the analysis.

In [1]:
# Run one time only - install pip and unusual Libraries
import pip
import pandas as pd
import shapely
from shapely.wkt import loads
from shapely.geometry import box
from shapely.geometry import MultiPolygon
import time
import json
import os
from gbdxtools import Interface
from gbdxtools.task import env
from gbdxtools import CatalogImage
gbdx = Interface()
%matplotlib inline
pip.main(['install','geopandas'])
import geopandas as gpd
from shapely.geometry import Point



You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


### Set UTM zone of points, number of images to return per AOI

In [2]:
target_crs = {'init' :'epsg:32630'} # edit for the UTM zone under analysis
keep = 10 # number of images to return per AOI

### Define AOI and bounding box of AOI from shapefile
The user needs to have their Area of Interest ("AOI") defined as points, in shapefile format. This should be uploaded to the Jupyter server. This step should be performed before running the next cell. 

The file must contain an additional field 'buffer_m' - which sets the distance around the point, in metres, to be considered as the AOI when searching for matching images

In [3]:
# set file path and point layer names
pointsfile = r'testpoints.shp'

# Import points shapefile
points_list = gpd.read_file(pointsfile)

# Ensure starting from WGS84
crs_WGS84 = {'init' :'epsg:4326'}
points_list = points_list.to_crs(crs_WGS84)

# move point geometry to a list
pointz = list(points_list['geometry'])

# Project to UTM zone 
points_list = points_list.to_crs(target_crs)

# buffer points
points_list['geometry'] = points_list.apply(lambda x: x['geometry'].buffer(x.buffer_m, cap_style = 3), axis = 1)

# Reproject to WGS84
crs_WGS84 = {'init' :'epsg:4326'}
points_list = points_list.to_crs(crs_WGS84)

# Create blank list for AOIs
AOI_list = []
bboxx_list = []

# Iterate through all objects in shapefile
for r in range(0, len(points_list)):   

    # define AOI object
    AOI = points_list['geometry'].loc[r]
    
    # add AOI to list of AOIs
    AOI_list.append(AOI)

    # Create bboxx - the square shaped box which will always contain the AOI.
    bboxx = []
    for coord in range(0,len(AOI.bounds)): 
        bboxx.append(AOI.bounds[coord])
    
    # add bboxx to list of AOIs
    bboxx_list.append(bboxx)


AOI: ||  POLYGON ((2.090166076195241 48.92845819728436, 2.09014781267055 48.92827899312151, 2.089875843823989 48.92829102684448, 2.089894106383141 48.92847023108281, 2.090166076195241 48.92845819728436)) 

bboxx: ||  [2.0898758438239895, 48.92827899312151, 2.090166076195241, 48.92847023108281]

AOI: ||  POLYGON ((2.624045470027603 48.9642972456376, 2.623924280841608 48.96322296744199, 2.622292752617023 48.96330275068213, 2.622413907098056 48.96437703187857, 2.624045470027603 48.9642972456376)) 

bboxx: ||  [2.622292752617023, 48.963222967441986, 2.624045470027603, 48.96437703187857]

AOI: ||  POLYGON ((2.146019524092651 49.19339185079222, 2.145870438145632 49.19195840849964, 2.143683332427004 49.19205609610155, 2.143832355691839 49.19348954330255, 2.146019524092651 49.19339185079222)) 

bboxx: ||  [2.1436833324270044, 49.191958408499644, 2.146019524092651, 49.19348954330255]

AOI: ||  POLYGON ((2.411548991524209 48.72734574654375, 2.411414059772862 48.7260919206131, 2.409518895721493 

### Define Categorical Screening Parameters

In [4]:
cutoff_cloud_cover = 10   # images with CC over this threshold discarded
cutoff_overlap = 0     # images with AOI overlap below this threshold discarded. [N.b.: keep small if AOI large.]
cutoff_date_upper = '28-Feb-18'  # images newer than this date discarded
cutoff_date_lower = '1-Sep-15'  # images older than this date discarded\
cutoff_nadir = 15 # Images at nadir angles greater than threshold discarded
cutoff_pan_res = 1 # Images below this resolution discarded
accepted_bands = ['PAN_MS1','PAN_MS1_MS2','PAN'] #  Images with any other band entry discarded

### Define Continuous Image Ranking Preferences

In [5]:
optimal = {
    'optimal_date' : '7-feb-17', # Optimal date (enter as dd-mmm-yy)
    'optimal_pan_res' : 0.4, # Optimal pan resolution, metres
    'optimal_nadir' : 0 # optimal image angle. 0 = vertical
}

# Define continuous image ranking preference weights. Must sum to 1.
# If user cares more about scenes being contemporaneous, up 'date' weighting at expense of other categories. 

pref_weights = {
    'cloud_cover': 0.2,
    'overlap':0.2,
    'date': 0.2,
    'nadir': 0.2,
    'resolution': 0.2
    }

### Define Catalog Search Function

In [6]:
#--------------------------------------------------------------------------------------
#Perform Initial Vector Search: AOI Vector Against Image Catalog for All Potential Images
#--------------------------------------------------------------------------------------

# Function queries the DigitalGlobe image database based on the vector 
# footprint of the chosen AOI, and returns a list of catalog IDs which at least partially cover the AOI. 
# As default, it returns up to 1000 images, with a cloud cover less than 25%. 

# Returns up to 1000 images where cloud cover smaller than 25%
def SearchUnordered(bbox, _type, cloud_cover, count):
    aoi = AOI.wkt
    query = "item_type:{} AND item_type:DigitalGlobeAcquisition".format(_type)
    query += " AND attributes.cloudCover_int:<{}".format(cloud_cover)
    return gbdx.vectors.query(aoi, query, count=count)

### Define Function for Generating .csv of all Matching Scenes 

In [7]:
#--------------------------------------------------------------------------------------
### Generate CSV of images that match criteria
#--------------------------------------------------------------------------------------

# Function generates a .csv file with the key metadata about the image which will be used for further processing later. 
# This is the definitive database for the given AOI. No filtering has yet taken place.
# Key calculated statistics include: the area where the scene and the AOI don't overlap (AA), 
# the corresponding overlap with the image (BB), the fraction of the AOI covered (frac), 
# and the binary flag for whether the image is yet in IDAHO. 

def ImageList(ids):
    
    # Define Counters
    l = 0    # number of non-IDAHO images
    scenes = [] # list containing metadata dictionaries of all scenes in our AOI 

    # Toggle for printing images to screen
    download_thumbnails = 0

    # Loop catalog IDs
    for i in ids:

        # Fetch metadata dictionary for each catalog ID in ids list
        r = gbdx.catalog.get(i)

        # Check location of ID - is it in IDAHO?
        location = gbdx.catalog.get_data_location(i)

        # Defines IDAHO variable as binary 1 / 0 depending on whether it is in IDAHO already or not
        if location == 'not_delivered':
            l = l + 1
            idaho = 0
        else:
            idaho = 1

            # Download image if image in IDAHO and toggle on
            if download_thumbnails == 1:
                image = CatalogImage(i, band_type="MS", bbox=bboxx)
                image.plot(w=10, h=10)
            else:
                pass

        # Print statement to consol for key image variables
        print 'ID: %s, Timestamp: %s, Cloud Cover: %s, Image bands: %s, IDAHO: %s' % (i,r['properties']['timestamp'],r['properties']['cloudCover'],r['properties']['imageBands'],idaho)

        # Calculate the percentage overlap with our AOI for each scene
        # load as a Shapely object the wkt representation of the scene footprint
        footprint = r['properties']['footprintWkt']
        shapely_footprint = shapely.wkt.loads(footprint)

        # Calculate the object that represents the difference between the AOI and the scene footprint 
        AA = AOI.difference(shapely_footprint)

        # Define frac as the fraction, between 0 and 1, of the AOI that the scene covers
        frac = 1 - ((AA).area / AOI.area)

        # Create BB - the proxy for the useful area. IF scene entirely contains AOI, then BB = AOI, else it is the intersection 
        # of the scene footprint and the AOI
        BB = AOI 
        if frac < 1:
            BB = AOI - AA
        #shapely_footprint.intersection(AOI)
        # Similarly, AA, the difference area between AOI and the scene, can be set to null if the scene contains 100% of the AOI 
        if frac == 1:
            AA = ""

        # Append key metadata to list obejct 'scenes' for the current scene, as a dictionary. This then moves into the pandas dataframe.
        # Several objects here are from DigitalGlobe's metadata dictionary (anything with an r start)
        scenes.append({
            'ID':i, 
            'TimeStamp':r['properties']['timestamp'],
            'CloudCover':r['properties']['cloudCover'],
            'ImageBands':r['properties']['imageBands'],
            'On_IDAHO':idaho,
            'browseURL': r['properties']['browseURL'],
            'Overlap_%': frac * 100,
            'PanResolution': r['properties']['panResolution'],
            'MultiResolution': r['properties']['multiResolution'],
            'OffNadirAngle': r['properties']['offNadirAngle'],
            'Sensor':r['properties']['sensorPlatformName'],
            'Full_scene_WKT':r['properties']['footprintWkt'],
            'missing_area_WKT':AA,
            'useful_area_WKT':BB
            })

    # Summary Statistics - show totals for images, both in IDAHO and currenlty unavailable images. 
    print 'Number of catalog IDs not available in IDAHO: %s' % l
    print 'Number of catalog IDs available in IDAHO: %s' % (len(ids) - l)

    # Define column order for dataframe of search results
    cols = ['ID','Sensor','ImageBands','TimeStamp','CloudCover','Overlap_%','PanResolution','MultiResolution','OffNadirAngle','On_IDAHO','browseURL','Full_scene_WKT','useful_area_WKT','missing_area_WKT']

    #Generate pandas dataframe from results
    out = pd.DataFrame(scenes,columns = cols)

    #return out
    return out

### Define Categorical Screening Function for Removing Disqualified Images

In [8]:
#--------------------------------------------------------------------------------------
### Categorical Screening Function
#--------------------------------------------------------------------------------------

# Performs pandas .loc operations to cut out scenes which don't meet the criteria specified by the 'cutoff' 
# series of variables.

def CategoricalScreen(out):
    
    # Convert Timestamp field to pandas DateTime object
    out['TS'] = out['TimeStamp'].apply(lambda x: pd.Timestamp(x))

    # Add separate date and time columns for easy interpretation
    string = out['TimeStamp'].str.split('T')
    out['Date'] = string.str.get(0)
    out['Time'] = string.str.get(1)

    # Categorical Search: remove disqualified images. Copy of dataframe taken, renamed to 'out_1stcut'.
    out_1stcut = out.loc[(out['CloudCover'] <= cutoff_cloud_cover) & 
                         (out['Overlap_%'] >= cutoff_overlap) & 
                         (out['TS'] > pd.Timestamp(cutoff_date_lower)) & 
                         (out['TS'] < pd.Timestamp(cutoff_date_upper)) &
                         (out['ImageBands'].isin(accepted_bands)) & 
                         (out['OffNadirAngle'] <= cutoff_nadir) & 
                         (out['PanResolution'] <= cutoff_pan_res)
                        ]

    # Print to consol remaining images after categorical search undertaken
    print '\nNumber of results remaining: %s' % len(out_1stcut.index)
    
    # return out_1stcut
    return out_1stcut

### Define Function for Continuous Image Quality Ranking

In [9]:
#--------------------------------------------------------------------------------------
### Image Ranking Function
#--------------------------------------------------------------------------------------
# For the remaining images that met minimum quality thresholds, rank them, best to worst. This is done on a points-based system. Images accrue points for: 
# - every % of cloud cover (1 point)
# - every % of missed overlap with the AOI (1 point)
# - every week away from the optimal date (1 point)
# - every degree away from nadir (1 point)
# - every cm of resolution worse than the optimal resolution 

# User preferences are defined in the 'pref_weights' dictionary. 
# Ensure all weights sum to 1.

def ImageRank(out_1stcut, optimal, pref_weights):
    # Apply ranking method over all non-disqualified search results for each field
    optimal_date = pd.to_datetime(optimal['optimal_date'], utc = True)

    # each 1% of cloud cover = 1 point
    out_1stcut['points_CC'] = (out_1stcut['CloudCover'])  

    # each 1% of overlap missed = 1 point
    out_1stcut['points_Overlap'] = (100 - out_1stcut['Overlap_%'])  

    # each week away from the optimal date = 1 point 
    out_1stcut['points_Date'] = ((abs(out_1stcut['TS'] - optimal_date)).view('int64') / 60 / 60 / 24 / 1E9) / 7 

    # each degree off nadir = 1 point
    out_1stcut['points_Nadir'] = abs(out_1stcut['OffNadirAngle'] - optimal['optimal_nadir']) 

    # each cm of resolution worse than the optimal resolution = 1 point
    out_1stcut['points_Res'] = (out_1stcut['PanResolution'] - optimal['optimal_pan_res']).apply(lambda x: max(x,0)) * 100 

    # Define ranking algorithm - weight point components defined above by the preference weighting dictionary
    def Ranker(out_1stcut, pref_weights):
        a = out_1stcut['points_CC'] * pref_weights['cloud_cover']
        b = out_1stcut['points_Overlap'] * pref_weights['overlap']
        c = out_1stcut['points_Date'] * pref_weights['date'] 
        d = out_1stcut['points_Nadir'] * pref_weights['nadir']
        e = out_1stcut['points_Res'] * pref_weights['resolution']

        # Score is linear addition of the number of 'points' the scene wins as defined above. More points = worse fit to criteria
        rank = a + b + c + d + e
        return rank

    # Add new column - Rank Result - with the total number of points accrued by the scene 
    out_1stcut['RankResult'] = Ranker(out_1stcut,pref_weights)

    # Add a Preference order column - Pref_Order - based on Rank Result, sorted ascending (best scene first)
    out_1stcut = out_1stcut.sort_values(by = 'RankResult', axis = 0, ascending = True)
    out_1stcut = out_1stcut.reset_index()
    out_1stcut['Pref_order'] = out_1stcut.index + 1
    out_1stcut = out_1stcut.drop(['index'], axis = 1)
    out_2ndcut = out_1stcut

    return out_2ndcut

### Define function for constructing final output file
Before construction of the scene mosaic, output a .csv with the details of selected and ranked scenes, named 'Scene_List.csv'

In [16]:
def OutputBuilder(out_2ndcut, keep):
    
    # trim to just the useful columns
    cols = ['Point','AOI_ID','ID','Sensor','ImageBands','Date','Time','CloudCover','Overlap_%','PanResolution','MultiResolution','OffNadirAngle','On_IDAHO','Pref_order','RankResult','points_CC','points_Overlap','points_Date','points_Nadir','points_Res','browseURL','Full_scene_WKT','useful_area_WKT','missing_area_WKT']
    out_2ndcut = out_2ndcut[cols]
    
    # trim to top 10 images by rank per AOI
    out_2ndcut = out_2ndcut[:keep]
    
    return out_2ndcut

### Define Main Function

In [17]:
def Main(AOI, i, keep, pointz): 
    
    # Run search on Area of Interest (AOI). Passes in AOI in Well Known Text format (wkt)
    records = SearchUnordered(AOI.wkt, 'DigitalGlobeAcquisition', count = 1000, cloud_cover = 25)

    # Create list object of all catalog IDs returned in search
    ids = [r['properties']['attributes']['catalogID'] for r in records]
    
    # Generate matching scene .csv
    out = ImageList(ids)
    
    # Apply categorical screen
    out1stcut = CategoricalScreen(out)
    print len(out1stcut)
    
    # Continuously Rank remaining images
    out2ndcut = ImageRank(out1stcut, optimal, pref_weights)
    print len(out2ndcut)
    
    # Add Point ID column
    out2ndcut['Point'] = pointz[i-1]
    out2ndcut['AOI_ID'] = i
    
    # Clean Output
    finals = OutputBuilder(out2ndcut, keep) 
    print len(finals)
    
    # Append results to scene_list
    scene_list.append(finals)

### Execute Main for all AOIs

In [18]:
scene_list = []    
i = 1

for AOI in AOI_list:
    Main(AOI, i, keep, pointz)
    i += 1
    
output = pd.concat(scene_list)
output.to_csv('Output.csv')
print('\nProcess complete')

ID: 1050410004368A00, Timestamp: 2013-12-03T11:07:29.484Z, Cloud Cover: 17, Image bands: PAN_MS1, IDAHO: 0
ID: 10100100009EBA00, Timestamp: 2002-06-17T10:56:10.767Z, Cloud Cover: 0, Image bands: PAN_MS1, IDAHO: 0
ID: 10504100043F4600, Timestamp: 2013-12-11T11:03:02.085Z, Cloud Cover: 0, Image bands: PAN_MS1, IDAHO: 1
ID: 10300100505A9700, Timestamp: 2015-12-31T11:03:31.627Z, Cloud Cover: 0, Image bands: PAN_MS1_MS2, IDAHO: 1
ID: 102001000112AD00, Timestamp: 2007-12-01T10:43:08.344Z, Cloud Cover: 15, Image bands: PAN, IDAHO: 0
ID: 1020010009EB6400, Timestamp: 2009-09-07T11:19:26.776Z, Cloud Cover: 23, Image bands: PAN, IDAHO: 0
ID: 1040010008A49800, Timestamp: 2015-03-05T11:03:35.843Z, Cloud Cover: 5, Image bands: PAN_MS1_MS2, IDAHO: 0
ID: 10400100088EF100, Timestamp: 2015-03-05T11:04:08.043Z, Cloud Cover: 8, Image bands: PAN_MS1_MS2, IDAHO: 1
ID: 10100100051A8D00, Timestamp: 2006-07-26T11:14:54.783Z, Cloud Cover: 0, Image bands: PAN_MS1, IDAHO: 0
ID: 103005002FF8B900, Timestamp: 2014-0

ID: 104A010035A10C00, Timestamp: 2017-10-19T11:27:53.000Z, Cloud Cover: 0, Image bands: SWIR, IDAHO: 1
Number of catalog IDs not available in IDAHO: 44
Number of catalog IDs available in IDAHO: 34

Number of results remaining: 4
4
4
4
ID: 1010010000AB3F00, Timestamp: 2002-06-25T10:45:56.725Z, Cloud Cover: 0, Image bands: PAN_MS1, IDAHO: 0
ID: 1040010021C5E100, Timestamp: 2016-09-03T11:05:47.186Z, Cloud Cover: 0, Image bands: PAN_MS1_MS2, IDAHO: 1
ID: 1010010003013200, Timestamp: 2004-06-09T10:48:36.090Z, Cloud Cover: 2, Image bands: PAN_MS1, IDAHO: 0
ID: 1050410004368A00, Timestamp: 2013-12-03T11:07:29.484Z, Cloud Cover: 17, Image bands: PAN_MS1, IDAHO: 0
ID: 10504100043F4600, Timestamp: 2013-12-11T11:03:02.085Z, Cloud Cover: 0, Image bands: PAN_MS1, IDAHO: 1
ID: 1030050045F94B00, Timestamp: 2015-10-03T10:44:25.777Z, Cloud Cover: 17, Image bands: PAN_MS1_MS2, IDAHO: 1
ID: 102001000EB01000, Timestamp: 2010-10-21T11:10:09.974Z, Cloud Cover: 0, Image bands: PAN, IDAHO: 0
ID: 101001000515C

ID: 103001000CC5F700, Timestamp: 2011-08-10T11:13:38.608Z, Cloud Cover: 4, Image bands: PAN_MS1_MS2, IDAHO: 0
ID: 1050410004850200, Timestamp: 2010-07-28T11:02:35.085Z, Cloud Cover: 24, Image bands: PAN_MS1, IDAHO: 0
ID: 10504100047B1200, Timestamp: 2010-08-24T10:47:21.084Z, Cloud Cover: 21, Image bands: PAN_MS1, IDAHO: 0
ID: 10504100047B2100, Timestamp: 2010-08-19T11:03:49.085Z, Cloud Cover: 14, Image bands: PAN_MS1, IDAHO: 0
ID: 103001000493BA00, Timestamp: 2010-04-14T11:13:17.265Z, Cloud Cover: 20, Image bands: PAN_MS1_MS2, IDAHO: 0
ID: 1020010013A18E00, Timestamp: 2011-05-09T11:08:41.532Z, Cloud Cover: 1, Image bands: PAN, IDAHO: 0
ID: 102001002E724400, Timestamp: 2014-05-15T10:45:25.138Z, Cloud Cover: 15, Image bands: PAN, IDAHO: 1
ID: 1050410000AF2E00, Timestamp: 2012-05-14T11:01:21.085Z, Cloud Cover: 0, Image bands: PAN_MS1, IDAHO: 1
ID: 1020010069C6F900, Timestamp: 2017-10-16T13:53:15.000Z, Cloud Cover: 0, Image bands: PAN, IDAHO: 1
ID: 102001002F413900, Timestamp: 2014-05-19T1

ID: 10200100305FA500, Timestamp: 2014-05-19T10:51:40.793Z, Cloud Cover: 0, Image bands: PAN, IDAHO: 1
ID: 1020010010DDDA00, Timestamp: 2010-11-24T11:14:15.929Z, Cloud Cover: 5, Image bands: PAN, IDAHO: 0
ID: 1030010050903000, Timestamp: 2015-12-23T10:58:16.215Z, Cloud Cover: 13, Image bands: PAN_MS1_MS2, IDAHO: 1
ID: 102001000BB9EC00, Timestamp: 2010-03-18T11:10:18.231Z, Cloud Cover: 0, Image bands: PAN, IDAHO: 1
ID: 103001004D4B7800, Timestamp: 2015-12-23T10:59:23.566Z, Cloud Cover: 15, Image bands: PAN_MS1_MS2, IDAHO: 1
ID: 103001000EA44700, Timestamp: 2011-09-28T11:15:43.548Z, Cloud Cover: 0, Image bands: PAN_MS1_MS2, IDAHO: 1
ID: 105041001072F100, Timestamp: 2014-05-04T11:09:41.085Z, Cloud Cover: 0, Image bands: PAN_MS1, IDAHO: 1
ID: 1050410001FEBB00, Timestamp: 2009-03-18T10:58:23.885Z, Cloud Cover: 0, Image bands: PAN_MS1, IDAHO: 0
ID: 102001001636B100, Timestamp: 2011-10-31T10:54:28.806Z, Cloud Cover: 14, Image bands: PAN, IDAHO: 0
ID: 104001001BA7C400, Timestamp: 2016-05-02T11: