# Welcome to ESS-DIVE's Finding & Accessing Data Jupyter Notebook

This Jupyter Notebook will help data users find and access ESS-DIVE datasets that employ file-level metadata and csv reporting formats, including:

    Use the ESS-DIVE Dataset API to access dataset files
    Use the xml file to explore / access a dataset
    Use the File-level Metadata (flmd) to explore the dataset
    Use Data Dictionaries to understand data content
    Explore Sample Metadata to explore datasets with sample-based data
    Import data from csv files into python pandas dataframes
    Download files to local storage and log access details

Written By: Leo Herrera

Acknowledgements: This notebook builds from Danielle Christianson's Search & Download notebook.

# Table of Contents
    1. Set Up
    2. Search with Dataset API 
    3. Alternative Searches (Sample Identifers)
    4. (Optional) Deep Dive API
    5. Subset Search Results (Dataset API)
    6. List the Datasets using Dataset Details Distribution (without flmd)
    7. Inspect Dataset File Contents using File-level Metadata 
    8. Inspect Dataset File Contents using Data Dictionary
    9. Use Sample ID and Metadata Reporting Format
    

# 1. Set up

In [1]:
# This notebook requires Python 3.
# ===================================

import csv
import datetime as dt
import io
import json
import os
import pandas as pd
import requests

from ipywidgets import widgets, interact
from IPython.display import display, display_html
from pathlib import Path
from urllib.request import Request, urlopen, urlretrieve
from zipfile import ZipFile

#===================
from urllib.parse import urlencode

### Configure authentification

1. Go to ESS-DIVE (https://data.ess-dive.lbl.gov/data), login with your ORCID, and copy your authentication token from your account settings page.
2. Run the code block below
3. Enter your authentication token into the widget that appears 
4. Do not rerun the code block below our your token will disapper.
5. Move on to the next block

   _Always re-run this code cell when you update your token. Tokens expire every 24 hours._

In [2]:
my_token = "<put_your_token_here>"

# ===================================
token_text = widgets.Text(my_token, description="Token:")
display(token_text)

Text(value='<put_your_token_here>', description='Token:')

In [3]:
essdive_api_url = 'http://api.ess-dive.lbl.gov'

essdive_direct_url = 'https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/'

token = token_text.value

## Change the local_dir variable to match a Path on computer. This is where the program will download files to.

In [35]:
#Change to match your own file path on your computer
local_dir = Path('/Users/YLH/ESS-DIVE')

# ===================================
if local_dir.exists():
    print(f'Success! Local directory {local_dir} configured for downloads')
    print('===================================')
    current_files = [x for x in os.listdir(local_dir) if x != '.DS_Store']
    if current_files:
        print(f'Local directory contains: {current_files}')
    else:
        print(f'Local directory is currently empty.')
else:
    print(f'Cannot find local directory {local_dir}. Please reenter valid directory path.')
    
download_file_log = {}
print('===================================')
print('Downloaded files will be logged in the dictionary object "download_file_log".\n'
      'You can save this dictionary as a file later in the notebook.\n'
      'The filename, file url, and datetime accessed are recorded as a tuple in the "downloaded_files" element.')

Success! Local directory /Users/YLH/ESS-DIVE configured for downloads
Local directory contains: ['Fulton_2024_Water_Column_Respiration_Data_Package.zip', 'explore_rfs.ipynb', 'IGSN_sample_metadata.csv', 'essdive-tutorials', 'NGA103_flmd.csv', 'LMA_leaf_carbon_nitrogen_kougarok_teller_user_guide_20190222.pdf', 'Deep-Dive API.ipynb', 'WHONDRS_YDE22_Data_Package.zip', 'Data_and_scripts_associated_with_a_manuscript.xml', '.ipynb_checkpoints', 'helloworld.py', 'Search_Dataset_API.ipynb']
Downloaded files will be logged in the dictionary object "download_file_log".
You can save this dictionary as a file later in the notebook.
The filename, file url, and datetime accessed are recorded as a tuple in the "downloaded_files" element.


In [36]:
# Run these general functions
# ===================================

def print_dataset_info(d, info_fields=['@id', 'name', 'description', 'citation'], line_space=False):
    """ 
    Display basic dataset info for evaluation 
    """
    for f in info_fields:
        value = d.get(f)
        
        if value is None:
            dataset_value = d.get('dataset')
            if dataset_value:
                value = dataset_value.get(f)
                    
        if value:
            if f in ['flmd_url', 'csv_files']:
                print(f"--- {f}:")
                for filename, url in value.items():
                    print(f"    - {filename}")
                continue
                          
            print(f"--- {f}: {value}")
            if line_space:
                print(" ")


def print_datasets_info(dataset_list, info_fields=['@id', 'name', 'description', 'citation'], line_space=False):
    """ 
    Display basic dataset info for evaluation 
    """
    print(f'=========== Info for {len(dataset_list)} datasets ===========')
    for a_dataset in dataset_list:
        print_dataset_info(a_dataset, info_fields, line_space)
                
        print("----------------------------------------------------------")
        
        
        
def assess_datasets_flmd_dd_csv_files(dataset_details_list):
    """
    Find the datasets with flmd files
    Sort the csv file contents into potential and data files; add to the dataset details dictionary
    """
    
    flmd_datasets_indices = set()
    flmd_dataset_details = []
    
    for idx, dataset in enumerate(dataset_details_list):
        file_list = dataset.get('distribution')
    
        flmd_url = {}
        csv_files = {}
        for f in file_list:
            encoding_format = f.get('encodingFormat')
            filename = f.get('name')
            url = f.get('contentUrl')
        
            if 'csv' not in encoding_format or url is None:
                continue
        
            if 'flmd' in filename:
                flmd_datasets_indices.add(idx)
                flmd_url.update({filename: url})
        
            else:
                csv_files.update({filename: url})

        dataset.update({
            'flmd_url': flmd_url,
            'csv_files': csv_files
        })
    
        if not flmd_url:      
            dataset_name = dataset.get('name')
            print(f"No flmd found for dataset: {dataset_name}")
        
    print("=====================================")
    
    if len(flmd_datasets_indices) > 0:
        print(f'flmd found in {len(flmd_datasets_indices)} datasets')
        flmd_dataset_details = [dataset_details_list[x] for x in flmd_datasets_indices]
    else:
        print(f'No datasets in the search results have flmds.')
        
    no_flmd_dataset_details = [dataset_detail for idx, dataset_detail in enumerate(dataset_details_list) if idx not in flmd_datasets_indices]
    
    return flmd_dataset_details, no_flmd_dataset_details


def get_dataset_details(dataset_url):
    
    response_status = None
    try:
        dataset_response = requests.get(dataset_url, headers={"Authorization": f"Bearer {token}"})
        response_status = dataset_response.status_code
    except Exception as e:
        print(f"{dataset.get('dataset').get('name')} did not have a successful return: {e}")
        return None

    # If successful response, add to dataset_store
    if response_status == 200:
            dataset_json = dataset_response.json()['dataset'] 
            print(f"--- Acquired details for {dataset_json.get('name')}")
            return dataset_json
    elif response_status:  
        print(f"Response status {response_status}: {dataset_response.text}")
    else:
        print(f"Response status unavailable. Response cannot be interpreted. Debug required.")
    return None


def get_request(filename, f_url, stream=True):
    """
    Get request for file, and stream the content back
    """

    headers = {'user_agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0',
               'content-type': 'application/json'}
    try:
        r = requests.get(f_url, headers=headers, verify=True, stream=stream)
        status_code = r.status_code
        if status_code == 200:
            return r
        else:
            print(f"{filename} request returned {status_code}")
            return None
    except Exception as e:
        print(f"{filename} request unsuccessful: {e}")
        return None
    
    
def make_store(file_request, use_idx=True, print_headers=True):
    """
    Read response and make store
    """
    file_store = {}
    csv_reader = csv.DictReader(file_request.iter_lines(decode_unicode=True))

    for idx, row in enumerate(csv_reader):
        if use_idx:
            file_store.update({f'Index {idx}': row})
            continue
        fn = row.get('File_Name')
        file_store.update({fn: row})
    
    headers = list(row.keys())
    if print_headers:
        print(f"File headers: {headers}")
    return headers, file_store


def make_pandas_df(file_url, header_rows=1, print_headers=True):
    """
    Read response and make pandas pdf from online csv file
    Designed for ESS-DIVE Sample ID and Metadata RF sample_metadata.csv files that have one header row.
    """
    p_df = pd.read_csv(file_url, skiprows=header_rows)
    
    headers = list(p_df.columns)
    if print_headers:
        print(f"File headers: {headers}")
    return headers, p_df


def inspect_dataset_distribution(dataset_detail, file_type='all'):

    print(dataset_detail.get('name'))
    print('========================================')

    count = 0
    dist = dataset_detail.get('distribution')
    
    for idx, file_info in enumerate(dist):
        fn = file_info.get('name')
        fn_url = file_info.get('contentUrl')
        f_encoding = file_info.get('encodingFormat')
        if file_type != 'all' and file_type not in f_encoding:
            continue
        print(f'Index {idx}: {fn}\n  encoding: {f_encoding}\n  url: {fn_url}')
        count += 1
        
    if count == 0:
        print(f'No files found that match the file_type: "{file_type}" criteria.')
            
            
def retrieve_file_from_essdive(file_url, file_path):
    """ Retrieve the data file 
        file_path includes file name.
    """     
    try:
        urlretrieve(file_url, file_path)
        return True, None
    except Exception as e:
        return False, f'File at url: {file_url} was not saved: {e}'
    

def download_selected_files(dataset_detail, file_indices, file_dir=local_dir, log_store=download_file_log, 
                            is_csv_zipped=False, zip_download=None, zip_member_fn=None):
    dist = dataset_detail.get('distribution')
    ds_id = dataset_detail.get('@id')
    citation = dataset_detail.get('citation')
    ds_name = dataset_detail.get('name')
    
    if log_store is None:
        log_store = {}
    
    log_store.setdefault(ds_id, {'@id': ds_id, 'name': ds_name, 'citation': citation, 'downloaded_files': []})
    ds_file_log = log_store.get(ds_id).get('downloaded_files')
    
    print(f'Saving files in {local_dir}')
    print("-------------------------------------")

    for idx, file_info in enumerate(dist):
        msg = None
        is_downloaded = None
        
        if idx not in file_indices:
            continue
            
        fn = file_info.get('name')
        file_path = local_dir / fn
        fn_url = file_info.get('contentUrl')
        
        if not is_csv_zipped:
    
            download_ts = dt.datetime.now().isoformat()
            is_downloaded, msg = retrieve_file_from_essdive(fn_url, file_path)
    
        else:
            if not zip_download or not zip_member_fn:
                print('ZipFile object and zipped member file name are required. Try again.')
                return None
            try:
                zip_download.extract(zip_member_fn, path=file_path)
                if Path.exists(file_path / zip_member_fn):
                    is_downloaded = True
                    download_ts = dt.datetime.now().isoformat()
                else:
                    msg = f'Extraction of {zip_member_fn} from {fn} was not successful.'
            except Exception as e:
                msg = f'ERROR attempting to extract {zip_member_fn} from {fn}: {e}'
        
        if is_downloaded:
            print(f'--- {fn} downloaded')
            ds_file_log.append((fn, fn_url, download_ts))
        else:
            print(msg)
            
    print("-------------------------------------")
    print(f'Remember to cite these files! Dataset DOI {ds_id}')
    return ds_id    


def inspect_zip_file_contents(dataset_detail, file_idx):
    dist = dataset_detail.get('distribution')
    file_info = dist[file_idx]
    
    if not file_info:
        print('File index not found. Please try again.')
        return
    
    fn = file_info.get('name')
    if 'zip' not in file_info.get('encodingFormat'):
        print(f'{fn} is not encoded as a zip file. Please select a different file.')
    
    fn_url = file_info.get('contentUrl')
    resp = urlopen(fn_url)
    
    zip_download = ZipFile(io.BytesIO(resp.read()))
    
    print(f'{fn} contents:')
    print('=================================')
    for idx, file_member in enumerate(zip_download.namelist()):
        print(f'Index {idx}: {file_member}')
        
    return fn, zip_download


def read_zipped_csv(zip_file_obj, csv_file_name, header_rows=1):
    # with open(zip_file_obj, mode='r') as z:
    #     csv_df = pd.read_csv(io.BytesIO(z.read(csv_file_name)))
    csv_df = pd.read_csv(zip_download.open(csv_file_name), skiprows=header_rows) #LH include encoding='windows-1254' as a paramter if Unicode error appears
    return csv_df
    
    
print('Functions loaded.')

#-------------------------------------------

def find_common_elements(array1, array2):
    common_elements = []
    for element in array1:
        if element in array2:
            common_elements.append(element)
    return common_elements

def find_unique_elements(array1, array2):
    unique_elements = []
    for element in array1:
        if element not in array2:
            unique_elements.append(element)
    return unique_elements


def in_deep_dive(doi):
    params  = {}
    params['rowStart'] = 1
    params['pageSize'] = 100
    params['doi'] = doi
    
    query_string=urlencode(params)

    r = requests.get(f"https://fusion.ess-dive.lbl.gov/api/v1/deepdive?{query_string}")
    if r.status_code == 200:
        results = r.json()['results']
        if not results:
            return False
        return True
    else:
        print("ERROR")
        print(r.text)
        return None
    
def search_datasets(go):
    header_authorization =  f"bearer {token}"
    response = requests.get(get_packages_response, headers={"Authorization": header_authorization})

    if response.status_code == 200:
        # Success
        global response_json 
        response_json = response.json()
        print("Success! Continue to look at the search results")  
        go = True
        return go
    else:
        # There was an error
        print("There was an error. Stop here and debug the issue. Email ess-dive-support@lbl.gov if you need assistance. \n")
        print(response.text)
        
def view_search_results():
    search_record_total = response_json['total']
    print(f"Datasets found: {search_record_total}")

    if search_record_total > 100:
        print("The search API cannot return more than 100 results at a time. See documentation for how to paginate.")

    global canidate_datasets
    canidate_datasets = response_json['result']

    for idx, dataset in enumerate(canidate_datasets):
        print('________________')
        print(f'Index: {idx}')
        print(dataset.get('dataset').get('name'))
        print(dataset.get('viewUrl'))
        print(dataset.get('dataset').get('datePublished'))

        if in_deep_dive(dataset.get('viewUrl')[35:]):
            print('In the DeepDive!')
            
def construct_query(essdive_api_url, text, keywords=None, providerName=None, creator=None, datePublished=None, rowStart=None, pageSize=None):
    # Default values for rowStart and pageSize if None
    rowStart = 1 if rowStart is None else rowStart
    pageSize = 100 if pageSize is None else pageSize

    # Start constructing the query
    query = f"{essdive_api_url}/packages?rowStart={rowStart}&pageSize={pageSize}&text={text}&isPublic=true"

    # Add additional parameters if they are not None
    if text is not None:
        query += f"&text={text}"
    if keywords is not None:
        query += f"&keywords={keywords}"
    if providerName is not None:
        query += f"&providerName={providerName}"
    if creator is not None:
        query += f"&creator={creator}"
    if datePublished is not None:
        query += f"&datePublished={datePublished}"

    return query

def get_string_before_first_slash(input_string):
    # Split the string by '/' and get the first part
    parts = input_string.split('/', 1)
    return parts[0]

def get_string_after_first_slash(s):
    """
    Returns the substring that appears after the first slash in the input string.
    If there is no slash, returns the entire string.
    
    :param s: Input string
    :return: Substring after the first slash or the entire string if no slash is found.
    """
    # Split the string using slash as the delimiter
    parts = s.split('/', 1)  # The '1' argument makes split return two parts at most

    # Check if the string was split into two parts
    if len(parts) > 1:
        return parts[1]  # Return the part after the first slash
    else:
        return s  # Return the original string if there is no slash

# Function to handle button click
def on_button_clicked(b):
    # Using the current value of the number widget as a parameter for the function
    display_file(index_widget.value, df)

def display_file(index, df):
    #get dataset details
    dataset_details_url = f'https://api.ess-dive.lbl.gov/packages/{df["version"][index]}'
    response_status = None
    try:
        dataset_response = requests.get(dataset_details_url, headers={"Authorization": f"Bearer {token}"})
        response_status = dataset_response.status_code
    except Exception as e:
        print(f"{dataset.get('dataset').get('name')} did not have a successful return: {e}")
        return None

    # If successful response, add to dataset_store
    if response_status == 200:
            dataset_json = dataset_response.json()['dataset'] 
            #print(f"--- Acquired details for {dataset_json.get('name')}")
            dataset_details = dataset_json
    elif response_status:  
        print(f"Response status {response_status}: {dataset_response.text}")
        dataset_details = None
    else:
        print(f"Response status unavailable. Response cannot be interpreted. Debug required.")
        dataset_details = None
    
    if not dataset_details:
        print(f"{dataset.get('dataset').get('name')} did not have a successful return: {e}")
        return None
    
    #inspect dataset
    
    #count = 0
    file_names = []
    dist = dataset_details.get('distribution')
    for idx, file_info in enumerate(dist):
        fn = file_info.get('name')
        file_names.append(fn)

    
    #matches the index with the file you searched for
    file_index = None
    try:
        file_index = file_names.index(get_string_before_first_slash(df['data_file'][index]))
    except ValueError:
        print(f"'{get_string_before_first_slash(df['data_file'][index])}' is not in the datafile list.") 
        print("There was an error, move on to the next section.")
    
    # unzip if zip is there 
    zip_file_idx = file_index
    dist = dataset_details.get('distribution')
    file_info = dist[zip_file_idx]
    if not file_info:
        print('File index not found. Please try again.')
        return
    
    fn = file_info.get('name')
    if 'zip' not in file_info.get('encodingFormat'):
        print(f'{fn} is not encoded as a zip file. Please select a different file.')
    
    fn_url = file_info.get('contentUrl')
    resp = urlopen(fn_url)
    
    zip_download = ZipFile(io.BytesIO(resp.read()))
    csv_file_index = zip_download.namelist().index(get_string_after_first_slash(df['data_file'][index]))
    
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_rows', None)
    
    header_rows = 0

    # ===================================
    csv_file_name = zip_download.namelist()[csv_file_index]
    print(f'Attempting to read: {csv_file_name} from zip file {fn}')
    print('=============================================================================')
    
    metadata_df = pd.read_csv(zip_download.open(csv_file_name),encoding='windows-1254', skiprows=header_rows)

    
    # check if metadata is at the top of the file
    if list(metadata_df.columns.values)[0].startswith("#"):
        header_rows = 1
        print("Addtional Information:", list(metadata_df.columns.values)[0][1:])
        metadata_df = pd.read_csv(zip_download.open(csv_file_name),encoding='windows-1254', skiprows=header_rows)

    if metadata_df is not None:
        is_csv_zipped = True
        headers = list(metadata_df.columns)
        display(metadata_df)
    else:
        print('ERROR: Sample metadata file was not successfully loaded.')
        
        

Functions loaded.


# 2. Search using the Dataset API

Use the ESS-DIVE Dataset API to search for datasets of interest.

You can search for datasets using any of the following parameters:

 - Dataset Creator (creator)
 - Date Published (datePublished)
 - Project Name (providerName)
 - Any text (text)
 - Keywords (keywords)
 - Public datasets only (isPublic)

### To search two different elements put both in a list. (N/A to 'text' and 'datePublished')
- (ex. 1) creator = ["Ely", "Barnes"]

### The API will display 25 datasets as a deafult, this code will display the max 100 results, as specified by pageSize. You may change this number as you please. 

### To search with parameters replace None with a search term. If you do not want to search with a specific paramter make sure None is placed there. 


In [37]:
# Enter search terms
# For an exact match, put the string in quotes, e.g. "\"Soil\"" is an exact match, "Soil" is any match

text= "\"Tree\""
keywords = None
providerName= None
creator= None
rowStart = 1
pageSize = 100
datePublished = "[2022 TO 2024]" # "<[YYYY TO YYYY-MM-DD]>" # Not the same as data coverage
# ===================================
get_packages_response = construct_query(essdive_api_url,text,keywords,providerName,creator,datePublished,rowStart,pageSize)

# Send request to API
go = False
go = search_datasets(go)
if go:
    view_search_results()

Success! Continue to look at the search results
Datasets found: 47
________________
Index: 0
Vegetation classification map and covariates associated with NEON AOP survey, East River, CO 2018, and used in "Falco et al. 2024: EcoImaging: advanced sensing to investigate plant and abiotic hierarchical spatial patterns in mountainous watersheds"
https://data.ess-dive.lbl.gov/view/doi:10.15485/1602034
2024
________________
Index: 1
Models, data, and scripts associated with “Prediction of Distributed River Sediment Respiration Rates using Community-Generated Data and Machine Learning”
https://data.ess-dive.lbl.gov/view/doi:10.15485/2318723
2024
________________
Index: 2
Data and scripts associated with a manuscript investigating dissolved organic matter and microbial community linkages across seven globally distributed rivers
https://data.ess-dive.lbl.gov/view/doi:10.15485/2319037
2024
________________
Index: 3
Machine learning predictions of near-surface permafrost extent at Teller 27, Telle

________________
Index: 37
FATES crown damage simulation outputs 2022
https://data.ess-dive.lbl.gov/view/doi:10.15486/NGT/1871026
2022
________________
Index: 38
Canopy spectra, Feb2017, PA-SLZ: Panama
https://data.ess-dive.lbl.gov/view/doi:10.15486/NGT/1841559
2022
________________
Index: 39
Leaf-to-canopy spectral reflectance photographs, Feb2018, PA-SLZ: Panama
https://data.ess-dive.lbl.gov/view/doi:10.15486/NGT/1842523
2022
________________
Index: 40
Duke Forest FACE (FACTS-I): Meteorological and Soil Data
https://data.ess-dive.lbl.gov/view/doi:10.15485/1895465
2022
________________
Index: 41
Intra-canopy leaf trait variation facilitates high LAI and compensatory growth in a clonal woody encroaching shrub in the tallgrass prairie
https://data.ess-dive.lbl.gov/view/doi:10.15485/1900530
2022
________________
Index: 42
Effects of Vegetation on Fluxes of Nitric Oxide, Nitrogen Dioxide, and Nitrous Oxide in a Mixed Deciduous Forest Clearing
https://data.ess-dive.lbl.gov/view/doi:10.1548

# 3. Alternate Searches (Sample Identifers)

## Search for Soil
### If your search returns more than 100 results, to view the next 100 datasets change rowStart variable to 101. If it's than 200, change rowStart to 201.
**(ex. rowStart = 101)**

In [38]:
text= "\"Soil Sample\""
keywords = None
providerName = None #Name of the project 
creator = None
datePublished = None  # "<[YYYY TO YYYY-MM-DD]>" # Not the same as data coverage 
rowStart = 1
pageSize = 100

# ===================================
get_packages_response = construct_query(essdive_api_url,text,keywords,providerName,creator,datePublished,rowStart,pageSize)

go = False
go = search_datasets(go)
if go:
    view_search_results()

Success! Continue to look at the search results
Datasets found: 78
________________
Index: 0
Soil Carbon Dynamics Following Land Use Changes and Conversion to Oil Palm Plantations in Tropical Lowlands Inferred From Radiocarbon
https://data.ess-dive.lbl.gov/view/doi:10.15485/2283436
2024
________________
Index: 1
Organic carbon feedbacks to soil structure in Profundihumic and Haplic Ferralsols in Campinas, Sao Paulo, Brazil
https://data.ess-dive.lbl.gov/view/doi:10.15485/2283432
2024
________________
Index: 2
Stem CO2 Efflux and growth rates in a selectively logged experiment in Central Amazon, 2001-2002
https://data.ess-dive.lbl.gov/view/doi:10.15486/NGT/1767825
2022
________________
Index: 3
Air temperature and relative humidity raw data from June 2016-Jan2018 at the K34 tower in Manaus, Brazil
https://data.ess-dive.lbl.gov/view/doi:10.15486/NGT/1602468
2020
________________
Index: 4
Raw leaf temperature data from May 2017- Aug 2017 at the B34 and K34 towers in Manaus, Brazil
https://

________________
Index: 39
Soil organic matter, tree communities, and fungal communities across mycorrhizal gradients in the Eastern United States
https://data.ess-dive.lbl.gov/view/doi:10.15485/1894515
2022
________________
Index: 40
Leaf temperature raw data, 2015 - 2017, at Manaus, Brazil
https://data.ess-dive.lbl.gov/view/doi:10.15486/NGT/1561988
2021
________________
Index: 41
Volatile monoterpene ‘fingerprints’ of resinous Protium tree species in the Amazon rainforest, Phytochemistry: Data
https://data.ess-dive.lbl.gov/view/doi:10.15486/NGT/1570410
2021
________________
Index: 42
E-Field_Log Metadata, BR-Ma2: Manaus, 2016 - 2017
https://data.ess-dive.lbl.gov/view/doi:10.15486/NGT/1556938
2021
________________
Index: 43
Latex Oxidation Defenses in Muiratinga (Maquira sclerophylla) in Manaus, Brazil
https://data.ess-dive.lbl.gov/view/doi:10.15486/ngt/1570409
2021
________________
Index: 44
Root-soil depth profile in Luquillo Experimental Forest, Puerto Rico, February, 2019
https://

## Search for Soil Using the Samples Reporting Format Keyword

In [40]:
text= "Soil"
keywords = "ESS-DIVE Sample ID and Metadata Reporting Format"
providerName = None #Name of the project 
creator = None
datePublished = None  # "<[YYYY TO YYYY-MM-DD]>" # Not the same as data coverage 
rowStart = 1
pageSize = 100

# ===================================
get_packages_response = construct_query(essdive_api_url,text,keywords,providerName,creator,datePublished,rowStart,pageSize)

go = False
go = search_datasets(go)
if go:
    view_search_results()

Success! Continue to look at the search results
Datasets found: 19
________________
Index: 0
Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1898912
2022
In the DeepDive!
________________
Index: 1
EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils, Sediments, and Water Across Coastal Gradients
https://data.ess-dive.lbl.gov/view/doi:10.15485/1960313
2023
________________
Index: 2
WHONDRS River Corridor Dissolved Oxygen, Temperature, Sediment Aerobic Respiration, Grain Size, and Water Chemistry from Machine-Learning-Informed Sites across the Contiguous United States (v3)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1923689
2023
In the DeepDive!
________________
Index: 3
Spatial Study 2021: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River 

## Search for Water Samples

In [16]:
text= "\"Water Sample\""
keywords = None
providerName = None #Name of the project 
creator = None
datePublished = None  # "<[YYYY TO YYYY-MM-DD]>" # Not the same as data coverage 
rowStart = 1
pageSize = 100

#=========================================
get_packages_response = construct_query(essdive_api_url,text,keywords,providerName,creator,datePublished,rowStart,pageSize)

go = False
go = search_datasets(go)
if go:
    view_search_results()

Success! Continue to look at the search results
Datasets found: 76
________________
Index: 0
Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1898912
2022
In the DeepDive!
________________
Index: 1
Temporal Study 2021-2022: Sensor-Based Time Series of Surface Water Temperature, Specific Conductance, Total Dissolved Solids, Turbidity, pH, and Dissolved Oxygen from across Multiple Watersheds in the Yakima River Basin in Washington, USA
https://data.ess-dive.lbl.gov/view/doi:10.15485/1892054
2022
________________
Index: 2
CO2 and CH4 Surface Flux, Soil Profile Concentrations, and Stable Isotope Composition, Utqiagvik (Barrow), Alaska, 2012-2013
https://data.ess-dive.lbl.gov/view/doi:10.5440/1227684
2015
________________
Index: 3
Total Dissolved Nitrogen and Ammonia Data for the East River Watershed, Colorado (2015-2023)
https:/

________________
Index: 37
SPRUCE S1 Bog Porewater, Groundwater, and Stream Chemistry Data: 2011-2013
https://data.ess-dive.lbl.gov/view/doi:10.3334/CDIAC/SPRUCE.018
2016
________________
Index: 38
Walker Branch Watershed: Weekly Stream Water Chemistry
https://data.ess-dive.lbl.gov/view/doi:10.3334/CDIAC/ORNLSFA.009
2016
________________
Index: 39
Transport and humification of dissolved organic matter within a semi-arid floodplain: Dataset
https://data.ess-dive.lbl.gov/view/doi:10.21952/WTR/1505876
2019
________________
Index: 40
Hillslope subsurface flow and transport data for the PLM transect in East River, Colorado
https://data.ess-dive.lbl.gov/view/doi:10.21952/WTR/1506941
2019
________________
Index: 41
Distinct Source Water Chemistry Shapes Contrasting Concentration Discharge Patterns, Water Resources Research: Dataset
https://data.ess-dive.lbl.gov/view/doi:10.21952/WTR/1528928
2019
________________
Index: 42
Predicting sedimentary bedrock subsurface weathering fronts and weather

## Search for Water Samples Using the Samples Reporting Format Keyword

In [52]:
text= "Water"
keywords = "ESS-DIVE Sample ID and Metadata Reporting Format"
providerName = None #Name of the project 
creator = None
datePublished = None  # "<[YYYY TO YYYY-MM-DD]>" # Not the same as data coverage 
rowStart = 1
pageSize = 100

#=========================================
get_packages_response = construct_query(essdive_api_url,text,keywords,providerName,creator,datePublished,rowStart,pageSize)

go = False
go = search_datasets(go)
if go:
    view_search_results()

Success! Continue to look at the search results
Datasets found: 26
________________
Index: 0
Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1898912
2022
In the DeepDive!
________________
Index: 1
Temporal Study 2021-2022: Sensor-Based Time Series of Surface Water Temperature, Specific Conductance, Total Dissolved Solids, Turbidity, pH, and Dissolved Oxygen from across Multiple Watersheds in the Yakima River Basin in Washington, USA
https://data.ess-dive.lbl.gov/view/doi:10.15485/1892054
2022
________________
Index: 2
EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils, Sediments, and Water Across Coastal Gradients
https://data.ess-dive.lbl.gov/view/doi:10.15485/1960313
2023
________________
Index: 3
Geospatial Information, Metadata, and Maps for Global River Corridor Science Focus Area Sites (v3)
htt

## Search for Sediment Samples



In [55]:
text= "Sediment Sample" #"\"Sediment\"" is an exact match, "Sediment" is any match
keywords = None
providerName = None #Name of the project 
creator = None
datePublished = None  # "<[YYYY TO YYYY-MM-DD]>" # Not the same as data coverage 
rowStart = 1
pageSize = 100

#=========================================
get_packages_response = construct_query(essdive_api_url,text,keywords,providerName,creator,datePublished,rowStart,pageSize)
go = False
go = search_datasets(go)
if go:
    view_search_results()

Success! Continue to look at the search results
Datasets found: 67
________________
Index: 0
Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1898912
2022
In the DeepDive!
________________
Index: 1
Temporal Study 2021-2022: Sensor-Based Time Series of Surface Water Temperature, Specific Conductance, Total Dissolved Solids, Turbidity, pH, and Dissolved Oxygen from across Multiple Watersheds in the Yakima River Basin in Washington, USA
https://data.ess-dive.lbl.gov/view/doi:10.15485/1892054
2022
________________
Index: 2
EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils, Sediments, and Water Across Coastal Gradients
https://data.ess-dive.lbl.gov/view/doi:10.15485/1960313
2023
________________
Index: 3
WHONDRS River Corridor Dissolved Oxygen, Temperature, Sediment Aerobic Respiration, Grain Size, and Wa

________________
Index: 33
WHONDRS Surface Water Chemistry and Organic Matter Characterization along the St. Lawrence River's Inland to Coastal Gradient, Eastern North America
https://data.ess-dive.lbl.gov/view/doi:10.15485/1898913
2022
In the DeepDive!
________________
Index: 34
ESS-DIVE Reporting Format for Sample-based Water and Soil Chemistry Measurements
https://data.ess-dive.lbl.gov/view/doi:10.15485/1865731
2022
________________
Index: 35
Sample Identifiers and Metadata Reporting Format for Environmental Systems Science
https://data.ess-dive.lbl.gov/view/doi:10.15485/1660470
2020
________________
Index: 36
Distributed hydrological, chemical, and microbiological measurements around Meander A of East River, Colorado.
https://data.ess-dive.lbl.gov/view/doi:10.15485/1507800
2019
________________
Index: 37
Oceanographic Conditions.  2007 - 2040.  North Slope Alaska.
https://data.ess-dive.lbl.gov/view/doi:10.15485/1876201
2022
________________
Index: 38
Chamber Flux and Porewater Conc

## Search for Sediment Samples Using the Samples Reporting Format Keyword

In [39]:
text= "Sediment"
keywords = "ESS-DIVE Sample ID and Metadata Reporting Format"
providerName = None #Name of the project 
creator = None
datePublished = None  # "<[YYYY TO YYYY-MM-DD]>" # Not the same as data coverage 
rowStart = 1
pageSize = 100

#=========================================
get_packages_response = construct_query(essdive_api_url,text,keywords,providerName,creator,datePublished,rowStart,pageSize)

go = False
go = search_datasets(go)
if go:
    view_search_results()

Success! Continue to look at the search results
Datasets found: 23
________________
Index: 0
Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1898912
2022
In the DeepDive!
________________
Index: 1
Temporal Study 2021-2022: Sensor-Based Time Series of Surface Water Temperature, Specific Conductance, Total Dissolved Solids, Turbidity, pH, and Dissolved Oxygen from across Multiple Watersheds in the Yakima River Basin in Washington, USA
https://data.ess-dive.lbl.gov/view/doi:10.15485/1892054
2022
________________
Index: 2
EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils, Sediments, and Water Across Coastal Gradients
https://data.ess-dive.lbl.gov/view/doi:10.15485/1960313
2023
________________
Index: 3
WHONDRS River Corridor Dissolved Oxygen, Temperature, Sediment Aerobic Respiration, Grain Size, and Wa

## More Searches 
### When searching by IGSN remove the first half and replace it with a '*' <br> (e.g) "10.58052/IEWFS0001" -> "*IEWFS0001"

In [9]:
# Uncomment the search word you want, or make your own.

#text = "\"Liquid>aqueous\""
#text = "Liquid>aqueous"
#text = "\"Pore water\""
#text = "plant structure"
#text = "\"surface water\""
#text = "Soil conductivity"
#text = "Soil bulk density"
#text = "NPOC (non-purgeable organic carbon)"
#text = "\"temperature point water\""
#text = "*1729719" # 
text = "*IEWDR00OJ" #(Search by IGSN) 

keywords = None
providerName = None #Name of the project 
creator = None
datePublished = None  # "<[YYYY TO YYYY-MM-DD]>" # Not the same as data coverage 
rowStart = 1
pageSize = 100

#=========================================
get_packages_response = construct_query(essdive_api_url,text,keywords,providerName,creator,datePublished,rowStart,pageSize)

go = False
go = search_datasets(go)
if go:
    view_search_results()

Success! Continue to look at the search results
Datasets found: 3
________________
Index: 0
Geospatial Information, Metadata, and Maps for Global River Corridor Science Focus Area Sites (v3)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1971251
2023
In the DeepDive!
________________
Index: 1
WHONDRS Summer 2019 Sampling Campaign: Global River Corridor Sediment FTICR-MS, Dissolved Organic Carbon, Aerobic Respiration, Elemental Composition, Grain Size, Total Nitrogen and Organic Carbon Content, Bacterial Abundance, and Stable Isotopes (v8)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1729719
2020
In the DeepDive!
________________
Index: 2
WHONDRS Summer 2019 Sampling Campaign: Global River Corridor Surface Water FTICR-MS, NPOC, TN, Anions, Stable Isotopes, Bacterial Abundance, and Dissolved Inorganic Carbon (v6)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1603775
2020
In the DeepDive!


In [10]:
#Optional: display entire response
#=======================
#display(response_json)

# 4. (Optional) Deep Dive API 
- <b>Did you notice some of the search results said "In the Deep Dive!" under the publishing year?
- You can search within the files of these datasets and visualize them.
1. Copy and paste the doi of a dataset that you saw the message "In the Deep Dive!" into the widget below
2. Make sure the doi is following this format 'doi:10.15485/2204421'
3. Run the following code to see the CSV files within the dataset
4. Visual the files with the display button, matching the index to the corresponding file.
5. To visualize a new file rerun the code so the previous datafile visualization will disapper.

In [11]:
#Run this code to make widget appear
doi = widgets.Text(
    value='',
    description='DOI:',
    disabled=False
)
display(doi)
# example doi: doi:10.15485/1603775

Text(value='', description='DOI:')

Run this code to see the CSV files within the dataset

In [41]:
# Query by field name
# Case-insensitive search 
params  = {}
params['rowStart'] = 1
params['pageSize'] = 25
if doi.value:
    params['doi'] = doi.value
    print("Searching for: ", doi.value)

query_string=urlencode(params)
#Uncomment to see query string
#print(query_string)


# Creating the number widget
index_widget = widgets.IntText(value=0, description='Index:', disabled=False)
button = widgets.Button(description="Display File")
button.on_click(on_button_clicked)
widgets.HBox([button, index_widget]) #Displaying the button and number widget next to each other
display(widgets.HBox([button, index_widget]))

r = requests.get(f"https://fusion.ess-dive.lbl.gov/api/v1/deepdive?{query_string}")
if r.status_code == 200:
    # Look at search results
    results = r.json()['results']
    df = pd.read_json(json.dumps(results))
    display(df)

else:
    print("ERROR")
    print(r.text)

Searching for:  doi:10.15485/1603775


HBox(children=(Button(description='Display File', style=ButtonStyle()), IntText(value=0, description='Index:')…

Unnamed: 0,field_name,unit,definition,data_type,total_record_count,missing_values_count,values_summary,doi,version,data_file,data_file_url
0,00691_DIC_mg_per_L_as_C,milligrams_per_liter,Dissolved inorganic carbon reported as carbon....,numeric,93,0,"{'min': 1.3900000000000001, 'max': 83.5}",doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_DIC.csv,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...
1,Sample_ID,,Sample ID where the replicates for sediment sa...,text,93,0,"{'unique': ['S19S_0005_ISO-3', 'S19S_0007_ISO-...",doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_DIC.csv,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...
2,Study_Code,,Unique code assigned to study.,text,93,0,{'unique': ['WHONDRS_S19S']},doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_DIC.csv,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...
3,Sample_ID,,Sample ID where the replicates for sediment sa...,text,265,0,"{'unique': ['S19S_0010_FCW-1', 'S19S_0010_FCW-...",doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_FlowCyt...,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...
4,Study_Code,,Unique code assigned to study.,text,265,0,{'unique': ['WHONDRS_S19S']},doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_FlowCyt...,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...
5,Total_Bacteria_cells_per_liter,cells_per_liter,Total bacteria.,numeric,265,20,"{'min': 127000000.0, 'max': 47600000000.0}",doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_FlowCyt...,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...
6,Total_Heterotrophs_cells_per_liter,cells_per_liter,Total heterotrophic bacteria. Calculated by su...,numeric,265,44,"{'min': 33100000.0, 'max': 45400000000.0}",doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_FlowCyt...,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...
7,Total_Photorophs_cells_per_liter,cells_per_liter,Total phototrophic bacteria.,numeric,265,34,"{'min': 18300.0, 'max': 3480000000.0}",doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_FlowCyt...,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...
8,methodDescription,,Description of method code.,text,168,0,"{'unique': ['Vial not received.', 'Vial did no...",doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_Metadat...,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...
9,methodID,,Alphanumeric ID associated with method code.,text,168,0,"{'unique': ['VM_001', 'VB_000', 'VB_001', 'VB_...",doi:10.15485/1603775,ess-dive-d3dc26585e68115-20240111T185146496,v6_WHONDRS_S19S_SW.zip/WHONDRS_S19S_SW_Metadat...,https://fusion.ess-dive.lbl.gov/api/v1/deepdiv...


Attempting to read: WHONDRS_S19S_SW_FlowCytometry.csv from zip file v6_WHONDRS_S19S_SW.zip


Unnamed: 0,Study_Code,Sample_ID,Total_Bacteria_cells_per_liter,Total_Photorophs_cells_per_liter,Total_Heterotrophs_cells_per_liter
0,WHONDRS_S19S,S19S_0010_FCW-1,3730000000,301000000,3430000000
1,WHONDRS_S19S,S19S_0010_FCW-2,4060000000,320000000,3740000000
2,WHONDRS_S19S,S19S_0010_FCW-3,4890000000,340000000,4550000000
3,WHONDRS_S19S,S19S_0023_FCW-1,1840000000,1390000000,445000000
4,WHONDRS_S19S,S19S_0023_FCW-3,1760000000,1240000000,514000000
5,WHONDRS_S19S,S19S_0023_FCW-2,1770000000,1460000000,312000000
6,WHONDRS_S19S,S19S_0067_FCW-2,1670000000,56500000,1610000000
7,WHONDRS_S19S,S19S_0067_FCW-1,1450000000,42600000,1410000000
8,WHONDRS_S19S,S19S_0033_FCW-1,1910000000,1600000,1910000000
9,WHONDRS_S19S,S19S_0033_FCW-3,187000000,1240000,186000000


# 5. Subset search results (from the last Dataset API search you ran)
You can single one or more datasets from the search results, simply record the Index number assocaited with the dataset and save it to the variable 'recorded_indicies' if you want to save more than one save the indices in a list using closed brackets, seperated by commas. 
#### ex. recorded_indicies = [2, 35, 30]

In [42]:
#dislplay results of most recent search
view_search_results()
print("\nEnd of list")

Datasets found: 19
________________
Index: 0
Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1898912
2022
In the DeepDive!
________________
Index: 1
EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils, Sediments, and Water Across Coastal Gradients
https://data.ess-dive.lbl.gov/view/doi:10.15485/1960313
2023
________________
Index: 2
WHONDRS River Corridor Dissolved Oxygen, Temperature, Sediment Aerobic Respiration, Grain Size, and Water Chemistry from Machine-Learning-Informed Sites across the Contiguous United States (v3)
https://data.ess-dive.lbl.gov/view/doi:10.15485/1923689
2023
In the DeepDive!
________________
Index: 3
Spatial Study 2021: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
https://data.ess-div

In [43]:
recorded_indices = [0, 4, 13]

#recorded_indices = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


#============================
datasets = [canidate_datasets[x] for x in recorded_indices]

for idx, dataset in enumerate(datasets):
    print(f"{idx}: {dataset.get('dataset').get('name')}")

0: Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
1: Schneider Springs Fire Study 2023 for Ecosystem Respiration Rates: Surface Water Chemistry and Hydrologic Sensor Data across the Yakima River Basin, Washington, USA
2: Total metals & anion concentration data; Slate River floodplain, Crested Butte, CO; May 2020-September 2020


## Display more information about selected datasets

In [19]:
datasets = [canidate_datasets[x] for x in recorded_indices]

for idx, dataset in enumerate(datasets):
    print('__________________________________________________')
    print(f"{idx}. {dataset.get('dataset').get('name')}")
    print ("-------Description--------"), print(f"{dataset.get('dataset').get('description')}")
    print ("-------Citation--------"), print(f"{dataset.get('citation')}")
    print ("-------Description--------"), print(f"{dataset.get('dataset').get('description')}")
    print(dataset.get('viewUrl'))

__________________________________________________
0. Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
-------Description--------
This dataset supports a broader study examining the drivers of temporal variability in sediment respiration rates in the Yakima River Basin. The dataset provides geochemistry and organic matter characterization data generated from samples collected at weekly or bi-weekly intervals at six sites across the Yakima River Basin in Washington, USA. Related sensor data can be found at https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1892054. This dataset is comprised of one main data folder containing (1) file-level metadata; (2) data dictionary; (3) field metadata; (4) dissolved inorganic carbon (DIC), dissolved organic carbon (DOC; reported as non-purgeable organic carbon; NPOC), total nitrogen (TN), total suspended solids (TSS), and ions; (5) av

## Get dataset details using the ESS-DIVE Dataset API
### Running the get_dataset_details function will extract even more infomration from the datasets, including filenames and keywords.

In [20]:
#===========
#Store the dataset details in a list 
dataset_details = []

for dataset in datasets:
    dataset_url = dataset.get('url')
    dataset_detail_json = get_dataset_details(dataset_url)
    if dataset_detail_json:
        dataset_details.append(dataset_detail_json)
        
print("==============================")
print(f"Details acquired for {len(dataset_details)} datasets.")

--- Acquired details for Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
--- Acquired details for Dissolved Inorganic Carbon and Dissolved Organic Carbon Data for the East River Watershed, Colorado (2015-2023)
--- Acquired details for In-situ electrochemical and water quality data; Slate River and East River floodplains, Crested Butte, CO; May 2022-September 2022
Details acquired for 3 datasets.


In [80]:
#optional: display dataset information
#(Un)comment options below
#print_datasets_info(dataset_details)

Determine which datasets have flmd (file level metadata)

In [21]:
#=======================
flmd_datasets, no_flmd_datasets = assess_datasets_flmd_dd_csv_files(dataset_details)

No flmd found for dataset: Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
flmd found in 2 datasets


# 6. List the Datasets using Dataset Details Distribution (without flmd)

#### list the datasets that do not have flmd files.

In [22]:
#=================
for idx, fd in enumerate(no_flmd_datasets):
    print(f"--- Index {idx}: {fd.get('name')}")

--- Index 0: Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)


Choose dataset to inspect using index above.

In [23]:
ds_idx = 0
file_type = 'all' # 'all' or 'csv' or 'pdf' or 'zip'

#=========================
inspect_dataset_distribution(no_flmd_datasets[ds_idx], file_type)

Temporal Study 2021-2022: Sample-Based Surface Water Chemistry and Organic Matter Characterization across Watersheds in the Yakima River Basin, Washington, USA (v2)
Index 0: Temporal_Study_2021_2022_Sample_Based_Surface.xml
  encoding: https://eml.ecoinformatics.org/eml-2.2.0
  url: https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-9be051bcebbb9e6-20240314T204040227
Index 1: v2_RC2_TemporalStudy_2021_2022_SampleData.zip
  encoding: application/zip
  url: https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-db3b75c47ffc9e4-20231107T002615378


## Download file(s) to a local directory

In [44]:
file_indices = [1]

#=====================
ds_doi = download_selected_files(no_flmd_datasets[ds_idx], file_indices, local_dir)

Saving files in /Users/YLH/ESS-DIVE
-------------------------------------
--- RC2_TemporalStudy_2021_2022_SensorData.zip downloaded
-------------------------------------
Remember to cite these files! Dataset DOI doi:10.15485/1892054


In [45]:
#Optional: display the download file log for this DOI
#=======================================
print(f'Downloaded file information for {ds_doi}:')
display(download_file_log[ds_doi])

Downloaded file information for doi:10.15485/1892054:


{'@id': 'doi:10.15485/1892054',
 'name': 'Temporal Study 2021-2022: Sensor-Based Time Series of Surface Water Temperature, Specific Conductance, Total Dissolved Solids, Turbidity, pH, and Dissolved Oxygen from across Multiple Watersheds in the Yakima River Basin in Washington, USA',
 'citation': ['Agarwal, D., Cholia, S., Hendrix, V. C., Crystal-Ornelas, R., Snavely, C., Damerow, J., & Varadharajan. (2022). ESS-DIVE Reporting Format for Dataset Metadata. Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), ESS-DIVE repository. https://doi.org/10.15485/1866026',
  "Damerow J ; Varadharajan C ; Boye K ; Brodie E ; Burrus M ; Chadwick D ; Cholia S ; Crystal-Ornelas R ; Elbashandy H ; Eloy Alves R ; Ely K ; Goldman A ; Hendrix V ; Jones C ; Jones M ; Kakalia Z ; Kemner K ; Kersting A ; Maher K ; Merino N ; O'Brien F ; Perzan Z ; Robles E ; Snavely C ; Sorensen P ; Stegen J ; Weisenhorn P ; Whitenack K ; Zavarin M ; Agarwal D (2020): Sample Identifiers and M

# 7. Inspect dataset contents using File-level Metadata (flmd)

### View flmd datasets

In [24]:
#========================
if not flmd_datasets:
    print('No datasets in the search results have flmds.')
else:
    for idx, fd in enumerate(flmd_datasets):
        print(f"--- Index {idx}: {fd.get('name')}")


--- Index 0: Dissolved Inorganic Carbon and Dissolved Organic Carbon Data for the East River Watershed, Colorado (2015-2023)
--- Index 1: In-situ electrochemical and water quality data; Slate River and East River floodplains, Crested Butte, CO; May 2022-September 2022


### Choose dataset to inspect

In [25]:
ds_idx = 1

#=========================
dataset = flmd_datasets[ds_idx]
print_dataset_info(dataset, info_fields=['@id', 'name', 'flmd_url'], line_space=True)


--- @id: doi:10.15485/1896309
 
--- name: In-situ electrochemical and water quality data; Slate River and East River floodplains, Crested Butte, CO; May 2022-September 2022
 
--- flmd_url:
    - SR_ER_2022_flmd.csv


### Select and read flmd
_If multiple flmd files exist in the dataset, run the cell below as many times as needed changing the index._

In [26]:
flmd_file_idx = 0 

#===================================
flmd_name, flmd_url = list(dataset.get('flmd_url').items())[flmd_file_idx]
print(f"{flmd_name}: {flmd_url}")
print('---------------------------')

flmd_response = get_request(flmd_name, flmd_url)

flmd_headers, flmd_store = make_store(flmd_response)

SR_ER_2022_flmd.csv: https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-d7fe54012e8de09-20221031T221100401
---------------------------
File headers: ['ï»¿File_Name', 'File_Description', 'Standard', 'UTC_Offset', 'File_Version', 'Contact', 'Start_Date', 'End_Date', 'Northwest_Latitude_Coordinate', 'Northwest_Longitude_Coordinate', 'Southeast_Latitude_Coordinate', 'Southeast_Longitude_Coordinate', 'Latitude', 'Longitude', 'Missing_Value_Codes', 'Notes', 'Field_Name_Orientation']


### View dataset files listed in flmd

In [27]:
#File name automatically included. Enter additional flmd fields.

flmd_header_indices = [1, -2]

#==================================
for idx, flmd_info in flmd_store.items():
    print(f"{idx}: {flmd_info.get(flmd_headers[0])}")
    for flmd_idx in flmd_header_indices:
        print(f"-- {flmd_headers[flmd_idx]}: {flmd_info.get(flmd_headers[flmd_idx])}")
    print(f"------------------------")

Index 0: SR_ER_2022_SpCond_data.csv
-- File_Description: 352 specific conductance measurements recorded at 7 locations between May and September of 2022 using a handheld Thermo Sci Orion DuraProbe 4-Electrode conductivity cell.
-- Notes: All measurements were made in the field with fresh groundwater/river sample. Groundwater samples were kept shaded from sun exposure during the extraction process. Samples were disposed of after measurement.
------------------------
Index 1: SR_ER_2022_pH_data.csv
-- File_Description: 352 pH measurements recorded at 7 locations between June and September of 2022 using a handheld Thermo Sci Orion Triode 3-in-1 probe.
-- Notes: All measurements were made in the field with fresh groundwater/river sample. Groundwater samples were kept shaded from sun exposure during the extraction process. Samples were disposed of after measurement.
------------------------
Index 2: SR_ER_2022_DO_data.csv
-- File_Description: 327 dissolved oxygen measurements recorded at 7 

# 8. Inspect Dataset File Contents using Data Dictionary

#### Choose indices of file of interest and its corresponding Data Dictionary file to inspect below.

In [28]:
#Enter data file index
data_file_index = 0

#Enter Data Dictionary file index 
dd_file_index = 10

#===================================
dd_file_name = flmd_store[f"Index {dd_file_index}"].get('File_Name')
data_file_name = flmd_store[f"Index {data_file_index}"].get('File_Name')
print(f'Data File: {data_file_name}\n'
      f'Data Dictionary File: {dd_file_name}')

KeyError: 'Index 10'

#### Inspect data dictionary 

In [29]:
# ==============================
data_files = dataset.get('csv_files')


if dd_file_name not in data_files.keys():
    print(f"Cannot find {dd_file_name} in dataset distribution.")
else:
    dd_url = data_files[dd_file_name]
    print(f"{dd_file_name}")
    print(f"{data_url}")
    print('-------------------------')
    
    data_request = get_request(data_file_name, data_url)
    dd_headers, dd_store = make_store(dd_request)
    print('-------------------------')

    for idx, dd_info in dd_store.items():
        print(f"{dd_info.get(dd_headers[0])} -- Units: {dd_info.get(dd_headers[1])} -- Desc: {dd_info.get(dd_headers[2])}")

NameError: name 'dd_file_name' is not defined

### Load selected csv data file into pandas dataframe

In [90]:
#===============================
if data_file_name not in data_files.keys():
    print(f"Cannot find {data_file_name} in dataset distribution.")
else:
    data_url = data_files[data_file_name]
    print(f"{data_file_name}")
    print(f"{data_url}")
    print('--------------------')
    
    data_request = get_request(data_file_name, data_url, steam =False)
    
    data_df = pd.read_csv(io.StringIO(data_request.text))
    
    display(data_df)

Cannot find 48Hour_readme.pdf in dataset distribution.


### Download selected CSV data file

In [91]:
#=====================
ds_doi = download_selected_files(dataset, [data_file_index], local_dir)

Saving files in /Users/YLH/ESS-DIVE
-------------------------------------
--- Data_and_scripts_associated_with_a_manuscript.xml downloaded
-------------------------------------
Remember to cite these files! Dataset DOI doi:10.15485/2319037


In [92]:
#Optional:display the download file log for this DOI
#============================
print(f'Downloaded file information for {ds_doi}:')
display(download_file_log[ds_doi])

Downloaded file information for doi:10.15485/2319037:


{'@id': 'doi:10.15485/2319037',
 'name': 'Data and scripts associated with a manuscript investigating dissolved organic matter and microbial community linkages across seven globally distributed rivers',
 'citation': ['Chu, R.K., Goldman, A.E., Brooks, S.C., Danczak, R.E., Garayburu-Caruso, V.A., Graham, E.B., Jones, M., Jones, N., Lin, X., Morad, J.W., Ren, H., Renteria, L., Resch, C.T., Tfaily, M., Tolic, N., Toyoda, J.G., Wells, J.R., Stegen, J.C., 2019. WHONDRS 48 Hour Diel Cycling Study at the East Fork Poplar Creek in Tennessee, USA. https://doi.org/10.15485/1577278',
  'Danczak, R.E., Goldman, A.E., Chu, R.K., Garayburu-Caruso, V.A., Graham, E.B., He, X., Lin, X., Meile, C., Morad, J.W., Ren, H., Renteria, L., Resch, C.T., Rooze, J., Schalles, J., Tfaily, M., Thomle, J., Tolic, N., Toyoda, J.G., Wells, J.R., Stegen, J.C., 2019. WHONDRS 48 Hour Diel Cycling Study at the Altamaha River in Georgia, USA. https://doi.org/10.15485/1577263',
  'Garayburu-Caruso, V.A., Goldman, A.E., Chu

# 9. Use Sample ID and Metadata Reporting Format
The example below starts with a search on the ESS-DIVE main search webpage: https://data.ess-dive.lbl.gov/

The dataset version/identifier of the desired dataset is entered below as the dataset_id.
* Find the dataset version in the upper left corner of the dataset's webpage next to the DOI.
* Or find the dataset identifier as the first field in the General metadata section (below the dataset files).

The search feature of the Dataset API illustrated in Section 2 above can also be used to find a dataset_id of interest. The dataset_id is the last part of the API URL shown in the results.

Example:
For the dataset detail url: https://api.ess-dive.lbl.gov/packages/ess-dive-f0861161a6bd3bf-20231109T125444193, the dataset_id is ess-dive-f0861161a6bd3bf-20231109T125444193.

## README

The code below performs **minimal** Sample ID and Metadata Reporting Format validation. Not all features may work if files do not adhere to the Reporting Format.

*Note: We leave the sample_metadata.csv column names unvalidated to increase the ability of inspecting the files.*
### Enter the dataset ID of interest 

In [30]:
dataset_id = 'ess-dive-2569191b32b447d-20230809T173212651'

# ===================================
# Find dataset identifier from search above or via Search Webpage
dataset_details_url = f'https://api.ess-dive.lbl.gov/packages/{dataset_id}'

dataset_detail = get_dataset_details(dataset_details_url)

--- Acquired details for Biogeochemistry of Pond B (Savannah River Site, South Carolina, USA): Sediment Core, Total extraction data, Pond B Savannah River Site July 2019. Subsurface Biogeochemistry of Actinides SFA


In [31]:
# Assess the dataset for fmld
# ===================================
flmd_datasets, no_flmd_datasets = assess_datasets_flmd_dd_csv_files([dataset_detail])

# Additional setup
# Set the default assumptions
is_csv_zipped = False
metadata_df = None
igsn_col_idx = None

No flmd found for dataset: Biogeochemistry of Pond B (Savannah River Site, South Carolina, USA): Sediment Core, Total extraction data, Pond B Savannah River Site July 2019. Subsurface Biogeochemistry of Actinides SFA
No datasets in the search results have flmds.


## View the dataset csv files
#### Look for the sample metadata file. It is a csv file that should have "sample_metadata" in the filename.

In [32]:
# ===================================
csv_files = dataset_detail.get('csv_files')

if not csv_files:
    print('No csv files. Try Zip File Option below.')

csv_index = []
idx = 0
for fn, url in csv_files.items():
    print(f'Index {idx}: {fn}\n{url}')
    csv_index.append(fn)
    idx += 1

Index 0: IGSN_sample_metadata.csv
https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-2045194f4857d80-20221206T003342857
Index 1: Cores_Alldata_database.csv
https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-9c81248f9e0a317-20211201T195517396
Index 2: PoreWater_summary.csv
https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-301011474ecdb87-20211201T200126969
Index 3: profiling.csv
https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-c7ab4d71dbd599b-20211201T195748332


### Select and load the sample metadata csv file

In [33]:
metadata_file_idx = 0

# ===================================
# get file_url
fn = csv_index[metadata_file_idx]
fn_url = csv_files.get(fn)

if not fn_url:
    print('Something is amiss! Could not find file_url. Try again.')
else:
    try:
        headers, metadata_df = make_pandas_df(fn_url, print_headers=False)
        print(f'{fn} was loaded as a pandas dataframe.')
        display(metadata_df)
    except Exception as e:
        print(f'Error while attempted to read the {fn_url} into a pandas dataframe. Try again.\nError: {e}')


IGSN_sample_metadata.csv was loaded as a pandas dataframe.


Unnamed: 0,IGSN,Sample Name,Sample Type,Material,Field Name,Description,Age Min,Age Max,Age Unit,collection Method,Collection Method Descr,Size,Size Unit,Geological Age,Geological Unit,Comment,Purpose,Release Date,Latitude Start,Latitude End,Longitude Start,Longitude End,Elevation Start,Elevation End,Elevation Unit,Depth in core (Min),Depth Max,Depth Scale,Nav Type Name,Physiographic Feature,Name of Physiographic Feature,Location Description,Locality,Locality Description,Country,City,Field Program/Cruise,Platform Type,Platform Name,Platform Description,Launch Type Name,Launch Platform Name,Launch Id,Collector/Chief Scientist,Collector/Chief Scientist Detail,collection Start Date,collection End Date,Current Archive,Current Archive Contact,Original Archive,Original Archive Contact,Cur Registrant Name,Org Registrant Name,classification Comment,Is Archived,URL
0,IESRP0044,SRP_Core2_0.25_a,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,0.25,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:00,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,
1,IESRP0045,SRP_Core2_0.25_b,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,0.25,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:01,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,
2,IESRP0046,SRP_Core2_0.25_c,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,0.25,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:02,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,
3,IESRP0047,SRP_Core2_1_a,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,1.0,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:03,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,
4,IESRP0048,SRP_Core2_1_b,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,1.0,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:04,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,
5,IESRP0049,SRP_Core2_1_c,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,1.0,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:05,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,
6,IESRP004A,SRP_Core2_2_a,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,2.0,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:06,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,
7,IESRP004B,SRP_Core2_2_b,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,2.0,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:07,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,
8,IESRP004C,SRP_Core2_2_c,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,2.0,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:08,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,
9,IESRP004D,SRP_Core2_3_a,Core,sediment,Lake,Sediment Core,,,,Coring>TriggerWeightCorer,sliced,1,cm,,,,"HF total extraction Pu, Trace, major elements,...",,,33.172798,-81.33986,,75,,meters,3.0,,cm,,Pond,Pond B,,Savannah River Site,Southern USA,United States,Aiken,Biogeochemistry of Actinides SFA,,,,,,,Brian Powell,"Clemson University, Clemson, SC",2019-06-25 00:00:09,,,Fanny Coutelot,,,Fanny Coutelot,Clemson University,,,


### Zip File Option: Inspect zipped dataset contents
#### Sometimes files of interest are packaged into zip files. This code will help you access those files

### Inspect all dataset files if sample_metadata.csv is not found


In [34]:
# Run if sample_metadata csv file is not found

# ===================================
inspect_dataset_distribution(dataset_detail, 'all')

Biogeochemistry of Pond B (Savannah River Site, South Carolina, USA): Sediment Core, Total extraction data, Pond B Savannah River Site July 2019. Subsurface Biogeochemistry of Actinides SFA
Index 0: Coutelot_2023_temporalPuCsatPondB.pdf
  encoding: application/pdf
  url: https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-786920a896fac91-20221202T193407683
Index 1: IGSN_sample_metadata.xls
  encoding: application/vnd.ms-excel
  url: https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-1381a055df201c6-20221206T003337568
Index 2: IGSN_sample_metadata.csv
  encoding: text/csv
  url: https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-2045194f4857d80-20221206T003342857
Index 3: Cores_Alldata_database.csv
  encoding: text/csv
  url: https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-9c81248f9e0a317-20211201T195517396
Index 4: PoreWater_summary.csv
  encoding: text/csv
  url: https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-301011474e

### Select zip file to inspect

In [99]:
# Run if sample_metadata is not found at the top-level of the dataset contents.
zip_file_idx = 1

# ===================================   
fn, zip_download = inspect_zip_file_contents(dataset_detail, zip_file_idx)

v2_RC2_TemporalStudy_2021_2022_SampleData.zip contents:
Index 0: FTICR/FTICR_Data/RC2_0001_ICR-1_p08.xml
Index 1: FTICR/FTICR_Data/RC2_0001_ICR-2_p08.xml
Index 2: FTICR/FTICR_Data/RC2_0001_ICR-3_p08.xml
Index 3: FTICR/FTICR_Data/RC2_0002_ICR-1_p08.xml
Index 4: FTICR/FTICR_Data/RC2_0002_ICR-2_p08.xml
Index 5: FTICR/FTICR_Data/RC2_0002_ICR-3_p08.xml
Index 6: FTICR/FTICR_Data/RC2_0003_ICR-1_p08.xml
Index 7: FTICR/FTICR_Data/RC2_0003_ICR-2_p08.xml
Index 8: FTICR/FTICR_Data/RC2_0003_ICR-3_p08.xml
Index 9: FTICR/FTICR_Data/RC2_0004_ICR-1_p08.xml
Index 10: FTICR/FTICR_Data/RC2_0004_ICR-2_p08.xml
Index 11: FTICR/FTICR_Data/RC2_0004_ICR-3_p08.xml
Index 12: FTICR/FTICR_Data/RC2_0005_ICR-1_p08.xml
Index 13: FTICR/FTICR_Data/RC2_0005_ICR-2_p08.xml
Index 14: FTICR/FTICR_Data/RC2_0005_ICR-3_p08.xml
Index 15: FTICR/FTICR_Data/RC2_0006_ICR-1_p08.xml
Index 16: FTICR/FTICR_Data/RC2_0006_ICR-2_p08.xml
Index 17: FTICR/FTICR_Data/RC2_0006_ICR-3_p08.xml
Index 18: FTICR/FTICR_Data/RC2_0007_ICR-1_p08.xml
Inde

### Select csv file within zip file to inspect

In [100]:
# Run if csv file is zipped up
csv_file_idx = 591

# If needed adjust the number of rows to skip. The Sample ID and Metadata RF specifies 1 header row.
header_rows = 1

# ===================================
csv_file_name = zip_download.namelist()[csv_file_idx]
print(f'Attempting to read: {csv_file_name} from zip file {fn}')

metadata_df = read_zipped_csv(zip_download, csv_file_name, header_rows)

if metadata_df is not None:
    is_csv_zipped = True
    headers = list(metadata_df.columns)
    display(metadata_df)
else:
    print('ERROR: Sample metadata file was not successfully loaded.')

Attempting to read: v2_RC2_Sample_IGSN-Mapping.csv from zip file v2_RC2_TemporalStudy_2021_2022_SampleData.zip


Unnamed: 0,Sample_Name,IGSN,Parent_IGSN,Material,Field_Name_Informal_Classification,Collection_Method,Collection_Method_Description,Comment,Latitude,Longitude,Primary_Physiographic_Feature,Locality,Country,Field_Program_Cruise,Collector_Chief_Scientist,Collection_Date,Related_URL,Related_URL_Type,Physiographic_Feature_Name,State_or_Province
0,RC2_0001,10.58052/IEPRS00CY,10.58052/IEPRS0094,Liquid>aqueous,Surface water,Grab,Surface water was either (1) pulled into syrin...,RC2 Temporal Study,46.9778,-121.170,stream,T06,United States,US Department of Energy River Corridor Science...,Stephanie_Fulton; Lupita_Renteria,2021-04-20,https://www.pnnl.gov/projects/river-corridor,regular URL,American River,Washington
1,RC2_0002,10.58052/IEPRS00CZ,10.58052/IEPRS0093,Liquid>aqueous,Surface water,Grab,Surface water was either (1) pulled into syrin...,RC2 Temporal Study,46.9890,-121.099,stream,T05P,United States,US Department of Energy River Corridor Science...,Stephanie_Fulton; Lupita_Renteria,2021-04-20,https://www.pnnl.gov/projects/river-corridor,regular URL,Little Naches River,Washington
2,RC2_0003,10.58052/IEPRS00D0,10.58052/IEPRS0095,Liquid>aqueous,Surface water,Grab,Surface water was either (1) pulled into syrin...,RC2 Temporal Study,46.7241,-120.700,stream,T41,United States,US Department of Energy River Corridor Science...,Stephanie_Fulton; Lupita_Renteria,2021-04-20,https://www.pnnl.gov/projects/river-corridor,regular URL,Naches River,Washington
3,RC2_0004,10.58052/IEPRS00D1,10.58052/IEPRS0090,Liquid>aqueous,Surface water,Grab,Surface water was either (1) pulled into syrin...,RC2 Temporal Study,46.5308,-120.470,stream,T03,United States,US Department of Energy River Corridor Science...,Stephanie_Fulton; Lupita_Renteria,2021-04-21,https://www.pnnl.gov/projects/river-corridor,regular URL,Yakima River,Washington
4,RC2_0005,10.58052/IEPRS00D2,10.58052/IEPRS008Z,Liquid>aqueous,Surface water,Grab,Surface water was either (1) pulled into syrin...,RC2 Temporal Study,46.2321,-120.000,stream,T02,United States,US Department of Energy River Corridor Science...,Stephanie_Fulton; Lupita_Renteria,2021-04-21,https://www.pnnl.gov/projects/river-corridor,regular URL,Yakima River,Washington
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
381,RC2_0207_RNA,10.58052/IEPRS00M2,10.58052/IEPRS0093,Other,Filter,Grab,0.22 micron filter used for collecting filtere...,RC2 Temporal Study,46.9890,-121.099,stream,T05P,United States,US Department of Energy River Corridor Science...,Sophia_McKever; Josh_Torgeson; Erica_Bakker,2022-04-07,https://www.pnnl.gov/projects/river-corridor,regular URL,Little Naches River,Washington
382,RC2_0208_RNA,10.58052/IEPRS00M3,10.58052/IEPRS0096,Other,Filter,Grab,0.22 micron filter used for collecting filtere...,RC2 Temporal Study,46.7290,-120.714,stream,T42,United States,US Department of Energy River Corridor Science...,Sophia_McKever; Josh_Torgeson; Erica_Bakker,2022-04-07,https://www.pnnl.gov/projects/river-corridor,regular URL,Naches River,Washington
383,RC2_0209_RNA,10.58052/IEPRS00M4,10.58052/IEPRS0090,Other,Filter,Grab,0.22 micron filter used for collecting filtere...,RC2 Temporal Study,46.5308,-120.470,stream,T03,United States,US Department of Energy River Corridor Science...,Sophia_McKever; Josh_Torgeson; Erica_Bakker,2022-04-07,https://www.pnnl.gov/projects/river-corridor,regular URL,Yakima River,Washington
384,RC2_0210_RNA,10.58052/IEPRS00M5,10.58052/IEPRS008Z,Other,Filter,Grab,0.22 micron filter used for collecting filtere...,RC2 Temporal Study,46.2321,-120.000,stream,T02,United States,US Department of Energy River Corridor Science...,Sophia_McKever; Josh_Torgeson; Erica_Bakker,2022-04-07,https://www.pnnl.gov/projects/river-corridor,regular URL,Yakima River,Washington


### Download sample_metadata file

In [None]:
# ===================================

if not is_csv_zipped:
    fn = csv_index[metadata_file_idx]
    all_file_idx = None

    for idx, filename in enumerate(dataset_detail.get('distribution')):
        if filename.get('name') == fn:
            all_file_idx = idx
            break
    if all_file_idx:
        ds_doi = download_selected_files(dataset_detail, [all_file_idx], local_dir)
    else:
        print('Could not find requested file.')
else:
    ds_doi = download_selected_files(dataset_detail, [zip_file_idx], local_dir, is_csv_zipped=is_csv_zipped, 
                                     zip_download=zip_download, zip_member_fn=csv_file_name)

In [None]:
# Optional: display the download file log for this DOI
# ===================================
print(f'Downloaded file information for {ds_doi}:')
display(download_file_log[ds_doi])

End of tutorial. If you have any question please email ess-dive-support@lbl.gov