# Downloading TROPOMI S5P_NO2_ files for particular user input time intervals and location using an API calls

**Earthdata API is used to access and query NASA data, for details: https://pypi.org/project/earthdata/**

In [1]:
import earthdata
from earthdata import Auth, Store, DataCollections, DataGranules

**Authentication for accessing the files from NASA GESDISC DATA ARCHIVE *(done by accessing netric file)***

In [2]:
###Now we have to create a .netrc file in home directory
    ##cd ~ or cd $HOME
    ##touch .netrc
    ##echo "machine urs.earthdata.nasa.gov login <uid> password <password>" >> .netrc (where <uid> is your user name and <password> is your Earthdata Login password without the brackets)
    ##chmod 0600 .netrc (so only you can access it)

auth = Auth()
auth.login(strategy="netrc")
# are we authenticated?
print(auth.authenticated)

You're now authenticated with NASA Earthdata Login
True


Note that in most cases, applications are authorized when you first access them but if you even find some more difficulty in accesing the data please check *How To Pre-authorize an application: https://wiki.earthdata.nasa.gov/display/EL/How+To+Pre-authorize+an+application

**We can also now search for collections using a pythonic API client for CMR**

In [3]:
###Locate DAAC (in oue case it is GES-DAAC to access TROPOMI S5P_L2_NO2_ files)###
#Query = DataCollections().daac("GES-DAAC")

###Find collections in the mentioned DAAC###
#print(f'Collections found: {Query.hits()}')
#collections = Query.fields(['ShortName']).get(10)

###Find the data of interest from collection of science data from GES DAAC###
#collections[:]

## Please enter date from 2019-08-06 onwards till date

### Input start time of interest

In [4]:
start_time = input('Enter start time in format: YYYY-MM-DD \n')

Enter start time in format: YYYY-MM-DD 
2020-07-27


### Input end time of interest

In [5]:
end_time = input('Enter end time in format: YYYY-MM-DD \n')

Enter end time in format: YYYY-MM-DD 
2020-07-28


**To find Sentinel-5P TROPOMI Tropospheric NO2 1-Orbit L2 5.5km x 3.5km V1 (2019-08-06 to 2021-07-01) and V2 (2021-07-01-ongoing); short name : S5P_L2__NO2____HiR ; at GES DISC
granules for given dates and access their metadata using get() method**

In [6]:
### We build our query, note as TROPOMI data comes 1file per orbit so we can temporally query the data### 
###spatial query : OTTAWA ###
###Please enter the location coordinates in bounding_box() in sequence: 'lower_left_lon', 'lower_left_lat', 'upper_right_lon', and 'upper_right_lat' 
###The short name for collection was found from cell[2]###

from pprint import pprint
Query = DataGranules().short_name('S5P_L2__NO2____HiR').bounding_box(-77.816162,44.474779,-73.937989,45.968509).temporal(start_time,end_time)

###We get all metadata records###
granules = Query.get()

granules

[Collection: {'ShortName': 'S5P_L2__NO2____HiR', 'Version': '1'}
 Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': 20.662, 'Latitude': -66.81}, {'Longitude': 2.24, 'Latitude': -62.111}, {'Longitude': -63.423, 'Latitude': -72.224}, {'Longitude': -76.124, 'Latitude': -80.97}, {'Longitude': 20.662, 'Latitude': -66.81}]}}, {'Boundary': {'Points': [{'Longitude': 3.509, 'Latitude': -62.609}, {'Longitude': -9.488, 'Latitude': -56.192}, {'Longitude': -60.417, 'Latitude': -64.381}, {'Longitude': -64.194, 'Latitude': -73.47}, {'Longitude': 3.509, 'Latitude': -62.609}]}}, {'Boundary': {'Points': [{'Longitude': -8.62, 'Latitude': -56.811}, {'Longitude': -17.758, 'Latitude': -49.407}, {'Longitude': -59.447, 'Latitude': -56.446}, {'Longitude': -60.611, 'Latitude': -65.627}, {'Longitude': -8.62, 'Latitude': -56.811}]}}, {'Boundary': {'Points': [{'Longitude': -17.15, 'Latitude': -50.102}, {'Longitude': -23.807, 'Latitude': -42.12}, {'Long

**Please note that Sentinel-5P TROPOMI Tropospheric NO2 1-Orbit L2 7km x 3.5km V1(S5P_L2__NO2___) at GES DIS 
staring from 2018-04-30 till 2019-08-06 can be accessed similarly using short name 'S5P_L2__NO2___' in the above code**

In [7]:
##length of spatiallly and temporally queried S5P_NO2 data

len(granules)

2

In [8]:
###a check just for verification: total number of files just temporally queried S5P_NO2 data (no spatial query inputed)

from pprint import pprint
Query = DataGranules().short_name('S5P_L2__NO2____HiR').temporal(start_time,end_time)

###We get all metadata records###
granules_1 = Query.get()

len(granules_1)

##note that the len(granule_1) is way more than len(granules) hence we are able to sort the tropomi data spatially and temorally

14

In [9]:
###explore granules metadata###
#[display(granule) for granule in granules]

**Now lets try to extract data URLS from the metadata of each datasets of our interest**

In [10]:
data_links = [granule.data_links() for granule in granules]

print(data_links)
#type(data_links)

[['https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.1/2020/209/S5P_OFFL_L2__NO2____20200727T161809_20200727T175938_14447_01_010302_20200802T075957.nc', 's3://gesdisc-cumulus-prod-protected/S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.1/2020/209/S5P_OFFL_L2__NO2____20200727T161809_20200727T175938_14447_01_010302_20200802T075957.nc'], ['https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.1/2020/209/S5P_OFFL_L2__NO2____20200727T175938_20200727T194108_14448_01_010302_20200802T081133.nc', 's3://gesdisc-cumulus-prod-protected/S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.1/2020/209/S5P_OFFL_L2__NO2____20200727T175938_20200727T194108_14448_01_010302_20200802T081133.nc']]


**OOPS!! we are not able to access the NASA GESDISC DATA ARCHIVE server to directly download the files using get() method from earthdata library so lets take a long cut (atleast for time being)**

**'https' url for GES-DAAC server and 's3'for AWS are sorted...**

In [11]:
##covert list to dataframe .. a long cut again :)
import pandas as pd
df = pd.DataFrame(data_links, columns = ['https' , 'sf3'])

##converted dataframe back to list sorting hhtps urls
inprem = df['https']#.values.tolist()
print(inprem)
 
##Sort nc or h5 from list : work in progress
#sort_order = ['nc']
#inprem.sort(key = lambda i: sort_order) # works in python 3
 
##printing result
#print ("The sorted list is : " + str(inprem))

0    https://data.gesdisc.earthdata.nasa.gov/data/S...
1    https://data.gesdisc.earthdata.nasa.gov/data/S...
Name: https, dtype: object


***Finally now, https urls for GHRC-DAAC server are opened using loop and temporarily sorted files are downloaded ;)***

In [12]:
##Works https url opens and data is downloaded to downloads as default location
#url_list = ['https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.1/2021/060/S5P_OFFL_L2__NO2____20210301T232825_20210302T010956_17530_01_010400_20210303T163111.nc', 'https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.1/2021/113/S5P_OFFL_L2__NO2____20210423T014106_20210423T032236_18269_01_010400_20210424T183027.nc']

import webbrowser
for url in inprem:
    response = webbrowser.open(url)

***THE CODE ENDs***

.

**Issues yet to be addressed:**
1) Not able to access the files from locall DAAC server directly using get() method
(By default the AWS links gets accessed and no file is downloaded)

2) Have used URL open method to download files but all files are saved in downloads: How to change the directory ?

NOT WORKING: Download URL method not working

#import requests
#response = requests.get('https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.1/2021/113/S5P_OFFL_L2__NO2____20210423T014106_20210423T032236_18269_01_010400_20210424T183027.nc')