# NSIDC DAAC Customize and Access Data Tutorial

This tutorial will walk you though how to access NSIDC DAAC data using spatial and temporal filters, as well as how to request customization services including subsetting, reformatting, and reprojection. 


Import packages


In [1]:
import requests, getpass, socket, pycurl, urllib.request, json, zipfile, io

## Create a token

We will generate a token needed in order to access data using your Earthdata Login credentials, and we will apply that token to the following queries.

In [2]:
# Earthdata Login credentials

uid = input('Earthdata Login user name: ')
pswd = getpass.getpass('Earthdata Login password: ')


Earthdata Login user name: amy.steiker
Earthdata Login password: ········


In [3]:
# Function used to request token from Common Metadata Repository

def create_token (uid, pswd):
    
    #import XML Element Tree
    from xml.etree.ElementTree import Element, SubElement, Comment, tostring
    
    #Find IP address
    hostname = socket.gethostname() 
    IP = socket.gethostbyname(hostname)
    
    #Create XML input

    token = Element('token')
    username = SubElement(token, 'username')
    username.text = uid
    password = SubElement(token, 'password')
    password.text = pswd
    client_id = SubElement(token, 'client_id')
    client_id.text = 'NSIDC_client_id'
    user_ip_address = SubElement(token, 'user_ip_address')
    user_ip_address.text = IP

    xml = (tostring(token, encoding='unicode', method='xml'))
    
    #Request token from Common Metadata Repository

    headers = {'Content-Type': 'application/xml'} 
    token = requests.post('https://api.echo.nasa.gov/echo-rest/tokens', data=xml, headers=headers)
    output = token.text
    
    #Grab token string

    start = '<id>'
    end = '</id>'

    tokenval =(output.split(start))[1].split(end)[0]
    
    return tokenval

In [4]:
# Run create_token function using Earthdata Login Username (uid) and Password (pswd) as inputs

token = create_token(uid, pswd)
print(token)

5EEEDA06-51CA-77AF-6746-F9E06FEFA9CF


## Select a data set of interest and determine the number and size of granules available within a time range and location.

Let's begin discovering an NSIDC DAAC data set by first inputting the data set of interest and determining the most recent version number. We will also find out how many data granules exist over an area and time of interest. The Common Metadata Repository is queried to explore this information.  

In [None]:
# Input data set short name (e.g. ATL03) of interest here.

short_name = input('Input short name, e.g. ATL03, here: ')

In [None]:
# For restricted collections
# First determine the latest verion number by querying CMR collection metadata.


# mr_url = 'https://cmr.earthdata.nasa.gov/search/collections.json?short_name=' + short_name + '&token=' + token
# cmeta = requests.get(cmr_url)
 
# with urllib.request.urlopen(cmr_url) as url:
#    cmeta_json = json.loads(url.read().decode())

In [None]:
# Get json response from CMR collection metadata


cmr_url = 'https://cmr.earthdata.nasa.gov/search/collections.json?short_name=' + short_name
cmeta = requests.get(cmr_url)
 
with urllib.request.urlopen(cmr_url) as url:
    cmeta_json = json.loads(url.read().decode())

# Find all instances of 'version_id' in metadata and print most recent version number
    
entry = cmeta_json['feed']['entry']
versions = []
for x in range(len(entry)):
    versions.append((entry[x]['version_id']))

latest_version = max(versions)
print('The most recent version of ', short_name, ' is ', latest_version)
    

Now that we have the most recent version of this data set, let's determine the number of granules available over our area and time of interest.

In [None]:
#https://cmr.earthdata.nasa.gov/search/granules?

#User input bounding box

#Maybe a way to upload a shapefile and output the coordinates?

#bounding_box = 

#User input temporal range 

#temporal = 

#with urllib.request.urlopen(cmr_url) as url:
#    cmeta_json = json.loads(url.read().decode())

#Print the number of files available and store that number

We will now query the average size of those granules. 

In [None]:

# print average granule size

Although subsetting, reformatting, or reprojecting can alter the size of the granules, this "native" granule size can still be used to guide us towards the best download method to pursue, which we will come back to later on in this tutorial.

## Determine the subsetting, reformatting, and reprojection services enabled for your data set of interest.

The NSIDC DAAC supports customization services on many of our NASA Earthdata mission collections. Let's discover if our data set has these services available. 

In [None]:
# Query service capability URL 

capability_url = 'https://n5eil02u.ecs.nsidc.org/egi/capabilities/' + short_name + '.' + latest_version + '.xml'

from xml.etree import ElementTree

response = requests.get(capability_url)
root = ElementTree.fromstring(response.content)

SubsetAgent in root.iter('SubsetAgent')

for SubsetAgent in root.iter('SubsetAgent'):
    print(SubsetAgent.attrib)

    
#Print Spatial subsetting available? Yes/no, etc.


In [None]:

#for SubsetAgent in root.findall('SubsetAgent'):
#    print(id)

#'SubsetAgent' in child.tagroot.iter('*')

#if not child in root.iter('SubAgent'):
#    print('No customization services exist for', short_name, 'version', latest_version)
#else: 
#    for child in root.iter('SubAgent'):
#        print(child.attrib['*'])


#for child in root.iter('SubAgent'):
#    print(child.attrib['*'])

For data sets with the services we just queried, let's explore the specific service options available for this data set and select which of these services we want to request.

In [None]:

from xml.etree import ElementTree

response = requests.get(capability_url)
root = ElementTree.fromstring(response.content)
for child in root.iter('Format'):
    print(child.attrib['value'])

    
#User input: Do you want reformatting? If so, select __ 

## Choose request method

There are two main access methods that the NSIDC DAAC supports through our Application Programming Interface (API).

The first is synchronous: The data request is processed on the fly. Upon completion, data are downloaded directly to this directory as a single zip file. 

The second method is asynchronous: The data request is processed at NSIDC and sent to you via email. The email contains zip file(s) and a link to the order information, along with individual output download links.

In [None]:
# Before we download, we need to determine the max number of sync granules

# 


# Download data test (still in progress)


# base URL: https://n5eil02u.ecs.nsidc.org/egi/request?
# short_name=GLAH12
# version=034
# bounding_box=-50.33333,68.56667,-49.33333,69.56667
# bbox=-50.33333,68.56667,-49.33333,69.56667
# time=2009-01-01T00:00:00,2009-12-31T23:59:59
# format=TABULAR_ASCII
# token=TOKEN-FROM-STEP-2

access_url = 'https://n5eil02u.ecs.nsidc.org/egi/request?' + 'short_name=GLAH12&' + 'version=034&' + 'bounding_box=-50.33333,68.56667,-49.33333,69.56667&' + 'bbox=-50.33333,68.56667,-49.33333,69.56667&' + 'time=2009-01-01T00:00:00,2009-12-31T23:59:59&' + 'format=TABULAR_ASCII&' + 'token=' + token
print(access_url)

In [None]:
r = requests.get(access_url, allow_redirects=True)
#open('google.ico', 'wb').write(r.content)
#resp = requests.get("http://www.example.com", 
                    #params = {"name":"Daniel", "id": 123456})
    
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()


#with open("data_output",'wb') as f: 
  
    # Saving received content as a png file in 
    # binary format 
  
    # write the contents of the response (r.content) 
    # to a new file in binary mode. 
    #f.write(r.content) 
  
