In [13]:
# Install dependencies for this notebook
%pip install -q -r "/Users/yohan/Documents/GitHub/OpenGeoHub_2025/Hyperspectral/requirements.txt"


Reason for being yanked: The version has a regression in 'md.update()' in certain conditions[0m[33m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.61.2 requires numpy<2.3,>=1.24, but you have numpy 2.3.0 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


# Accessing Earthdata from NASA

## Summary

In this notebook we learn how to search and download SPEXone data from NASA through [earthaccess](https://earthaccess.readthedocs.io/en/latest/). For this purpose you need an [EARTHDATA login](https://urs.earthdata.nasa.gov/). Furthermore we have a look at how the data is structured.

## Learning objectives

- How to find relevant SPEXone data
- How to download the data
- How the data content is structured

## Content

1. [Locating SPEXone data](#Locating-SPEXone-data)
2. [Download SPEXone data](#Download-SPEXone-data)
3. [Content of SPEXone data](#Content-of-SPEXone-data)
4. [Interesting Data](#Interesting-Data)

# Setup
Run the following cell to load the relevant python packages and authenticate for this earthaccess session:

In [1]:
import earthaccess
import numpy as np
import xarray as xr

auth = earthaccess.login(persist=True)

# Locating SPEXone data

Next we will browse for data of interest using the search_dataset method. The help function provides some additional information. Run the next cell to see the search_dataset documentation. Then answer the following questions:

* What does search_datasets return?
* Which parameter can be used to select a specific spot on Earth?
* Which parameter can be used to select specific dates?

In [2]:
help(earthaccess.search_datasets)

Help on function search_datasets in module earthaccess.api:

search_datasets(count: int = -1, **kwargs: Any) -> List[earthaccess.results.DataCollection]
    Search datasets using NASA's CMR.
    
    [https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)
    
    Parameters:
        count: Number of records to get, -1 = all
        kwargs (Dict):
            arguments to CMR:
    
            * **keyword**: case-insensitive and supports wildcards ? and *
            * **short_name**: e.g. ATL08
            * **doi**: DOI for a dataset
            * **daac**: e.g. NSIDC or PODAAC
            * **provider**: particular to each DAAC, e.g. POCLOUD, LPDAAC etc.
            * **has_granules**: if true, only return collections with granules
            * **temporal**: a tuple representing temporal bounds in the form
              `(date_from, date_to)`
            * **bounding_box**: a tuple representing spatial bound

Note that this docstring does not contain all available parameters. However, if you go to the link in the docstring [https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html) you see all available keywords. Can you find out which parameter we need to provide if we want data from the PACE platform only? Which parameter we need to provide if we want only data from the SPEXone instrument?

Let's have a look at the data from the PACE platform, run the following cell to see what data is available:

In [3]:
results = earthaccess.search_datasets(platform="PACE", instrument="SPEXone")
len(results)

14

We get 16 results for the SPEXone instrument on the PACE platform. From the search_datasets docstring we know that these results are of type DataCollection. To see what we can do with a DataCollection we will read once again the documentation:

In [4]:
help(earthaccess.DataCollection)

Help on class DataCollection in module earthaccess.results:

class DataCollection(CustomDict)
 |  DataCollection(collection: Dict[str, Any], fields: Optional[List[str]] = None, cloud_hosted: bool = False)
 |  
 |  Dictionary-like object to represent a data collection from CMR.
 |  
 |  Method resolution order:
 |      DataCollection
 |      CustomDict
 |      builtins.dict
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __repr__(self) -> str
 |      Return repr(self).
 |  
 |  abstract(self) -> str
 |      Placeholder.
 |      
 |      Returns:
 |          The abstract of a collection.
 |  
 |  concept_id(self) -> str
 |      Placeholder.
 |      
 |      Returns:
 |          A collection's `concept_id`.This id is the most relevant search field on granule queries.
 |  
 |  data_type(self) -> str
 |      Placeholder.
 |      
 |      Returns:
 |          The collection data type, i.e. HDF5, CSV etc., if available.
 |  
 |  get_data(self) -> List[str]
 |      Placeholder.

Try out the abstract, summary and version methods for the first element of the results list. What do you find?

In [5]:
results[0].abstract()

"The Spectro-polarimeter for Planetary Exploration one (SPEXone) instrument flying aboard the PACE spacecraft is a multi-angle polarimeter. It measures the intensity, Degree of Linear Polarization (DoLP) and Angle of Linear Polarization (AoLP) of sunlight reflected back from Earth's atmosphere, land surface, and ocean. The focus of the SPEXone development is to achieve a very high accuracy of DoLP measurements, which facilitates accurate characterization of aerosols in the atmosphere."

In [6]:
results[0].summary()

{'short-name': 'PACE_SPEXONE_L0',
 'concept-id': 'C2816780240-OB_CLOUD',
 'version': '1',
 'file-type': "[{'Format': 'binary'}]",
 'get-data': ['https://oceandata.sci.gsfc.nasa.gov/directdataaccess/Level-0/'],
 'cloud-info': {'Region': 'us-west-2',
  'S3CredentialsAPIDocumentationURL': 'https://obdaac-tea.earthdatacloud.nasa.gov/s3credentialsREADME',
  'S3CredentialsAPIEndpoint': 'https://obdaac-tea.earthdatacloud.nasa.gov/s3credentials',
  'S3BucketAndObjectPrefixNames': ['s3://ob-cumulus-prod-public/']}}

In [7]:
results[0].version()

'1'

The summary method sends back a dictionary, the code in the cell below prints for all results in the result-list the short-name. Can you guess the short-name for raw data?

In [8]:
for result in results:
    print(result.summary()["short-name"])

PACE_SPEXONE_L0
PACE_SPEXONE_L1A_SCI
PACE_SPEXONE_L1B_SCI
PACE_SPEXONE_L1C_SCI
PACE_SPEXONE_L2_AER_RTAPLAND
PACE_SPEXONE_L2_AER_RTAPLAND_NRT
PACE_SPEXONE_L2_AER_RTAPOCEAN
PACE_SPEXONE_L2_AER_RTAPOCEAN_NRT
PACE_SPEXONE_L3M_AER_RTAP
PACE_SPEXONE_L3M_AER_RTAPLAND
PACE_SPEXONE_L3M_AER_RTAPLAND_NRT
PACE_SPEXONE_L3M_AER_RTAPOCEAN
PACE_SPEXONE_L3M_AER_RTAPOCEAN_NRT
PACE_SPEXONE_L3M_AER_RTAP_NRT


PACE is the platform, SPEXONE is the instrument, and the remainder gives the processing level of the data. In particular: 
* L0 is raw data containing CCSDS packets as received from the PACE satellite
* L1A is reconstructed but unprocessed L0-data with time and orbit information organized in a well-defined format
* L1B is calibrated L1A-data on the SPEXone native grid 
* L1C is L1B data regridded to the PACE grid, which is the same for all instruments on PACE 
* L2 data contains geophysical variables derived from and at the same resolution and location as the L1C source data.
* L3 data contains geophysical variables mapped on uniform space-time grid scales, usually with some completeness and consistency.

We will focus on L1C data during this tutorial.

# Download SPEXone data

Next we will download a granule containing clouds, land, and ocean scenes. For this purpose use the following data selection criteria:

* short_name
* bounding box
* timespan
* version

In [None]:
bbox = (-119.5, -20.5, 120, -19.5)  # W S E N corners of the box
temporal = ("2025-06-23", "2025-06-23")
short_name = "PACE_SPEXONE_L1C_SCI"
results = earthaccess.search_data(
    short_name = short_name,
    temporal = temporal,
    bounding_box = bbox,
)

# Let's see what we get
print(len(results))
l1c_path = earthaccess.download(results[0], "./data")[0]

# Where is the data downloaded?
from pathlib import Path
print(Path(l1c_path).resolve())

10


QUEUEING TASKS | :   0%|          | 0/1 [00:00<?, ?it/s]

PROCESSING TASKS | :   0%|          | 0/1 [00:00<?, ?it/s]

COLLECTING RESULTS | :   0%|          | 0/1 [00:00<?, ?it/s]

# Content of SPEXone data

The downloaded data can then be inspected or opened by various tools. A useful one that we will not cover in this tutorial but suggest you to check out is silx.

Altenatively we can use xarray to get the relevant data into memory:

In [10]:
#paths = earthaccess.open(results)
prod = xr.open_dataset(l1c_path)
prod

You might notice that only the attributes are loaded and no variables or even variable names. For now it is not possible to open multi-group datasets with xarray or inspect the available groups. Hence, for now you have to use a different tool to see what groups are available (e.g. silx) then load the corresponding groups separately and subsequently combine them into a product. For this tutorial we will provide the (relevant) group names. In the nearish future xarray will be updated to handle multi-group datasets and life will become just that little bit nicer.

In [16]:
view = xr.open_dataset(l1c_path, group="sensor_views_bands").squeeze()
geo = xr.open_dataset(l1c_path, group="geolocation_data").set_coords(["longitude", "latitude"])
obs = xr.open_dataset(l1c_path, group="observation_data").squeeze()
dataset = xr.merge((prod, obs, view, geo))
dataset

Take some time to check the attributes of the variables by clicking on the Coordinates/Data variables and subsequently on the attribute symbol of the variables.

Data opened with open_dataset is read only, i.e. you must not fear that you change the remote data. You can change the in memory data though.

Finally you could as well go to the earthdata portal to download a granule: https://search.earthdata.nasa.gov/search?portal=obdaac&q=pace

# Interesting Data

In the coming notebooks we will take a closer look at the data in the L1C and L2 data products. Use the following link to search for some scenes that might be interesting:

https://worldview.earthdata.nasa.gov/?l=Coastlines_15m,OrbitTracks_PACE_Ascending,OCI_PACE_True_Color&lg=true&t=2025-06-22-T09%3A25%3A14Z

The link takes you to worldview. The visulized data is taken from OCI. OCI has a wider swath than SPEXone. Hence you are looking for interesting scenes close to the center of the swath shown in yellow.

Ideas for interesting scenes: dust storms, volcanoes, forest fires, clouds, water, snow, ...

Once you have found some scenes that interest you, download both the corresponding L1C granules with your preferred method.

Congratulations! You successfully gathered SPEXone data :-) we will have a closer look at the data in the following notebooks of the tutorial.