[Doc]: Create a Jupyter Notebook for converting column-based data to grid-based data #89

tomvothecoder · 2021-08-04T21:50:29Z

Create a Jupyter Notebook tutorial that covers how to convert column-based data to grid-based data for use with xCDAT APIs.

Related to comment

tomvothecoder · 2021-08-04T22:40:15Z

Hey @pochedls, can you fill out the description surrounding this issue?

pochedls · 2021-08-19T20:26:36Z

Description: One general goal of xcdat is to operate on CMIP-like output data, but also on native E3SM output. E3SM output is column-based (e.g., time x column) and is not natively saved in a latitude x longitude matrix. To support spatial averaging on E3SM's native grid, we would need to assess and develop functionality to do this (e.g., estimate column boundaries, generate weights, and refactor code to handle averaging of column-based arrays).

nocollier · 2023-04-28T15:28:18Z

Just FYI, you can take column data from the raw E3SM h0/h1 files, and build a pandas dataframe where you set the index equal to 'time', 'lat', and 'lon'. Then pandas has a to_xarray() function that will return a dataset that spans the bounding box of the run (even if it is unstructured) with nan's filled in where you have no data. If you include areacella/sftlf in the columns, then you can the 2D arrays and can average as usual.

See code in this vicinity:
https://github.com/rubisco-sfa/ILAMB/blob/603ec2f219e9470e4898e0efbf3d8f0346bad8b9/src/ILAMB/e3sm_result.py#L151

tomvothecoder · 2023-05-09T16:50:32Z

Thank you @nocollier! It is helpful to know that we can prepare an xr.Dataset object for spatial averaging rather than integrate E3SM specific logic in xcdat.

I'll check in with @pochedls to see if this issue should still be open.

pochedls · 2023-05-09T16:58:33Z

Thanks @nocollier!

@tomvothecoder – maybe we could have @chengzhuzhang or someone else on E3SM test this out to see if this approach works with xcdat functions?

chengzhuzhang · 2023-05-19T23:23:41Z

@nocollier thank you for bringing this approach to our attention. The team is excited about your method. I'm trying to adjust your codes to test on E3SM native atmosphere data, but came across an error I can't resolve. If at all possible would you please help look at the code and spot any obvious issue? Thanks a lot!

import pandas as pd
import xarray as xr

# E3SM native monthly atmospheric output example data available at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/zhang40/TREFHT.h0.2000.nc
filename = "/Users/zhang40/Documents/xcdat_test_e3sm/native/TREFHT.h0.2000.nc"
# Uses pandas.to_xarray() to handle possibly unstructured grids
used = ["time_bnds", "lat", "lon", "area", "LANDFRAC", "TREFHT"]
dset = xr.open_dataset(filename)[used]
# Comment out below, data already concatenated.
# dset = xr.concat(dset, dim="time")
rem_attrs = {v: dset[v].attrs for v in used}
time_bnds = dset[used.pop(0)]
series = {"time": pd.Series(dset["time"]).repeat(dset.dims["ncol"])}
series.update({v: dset[v].values.flatten() for v in used})
dset = pd.DataFrame(series).set_index(["time", "lat", "lon"]).to_xarray()
dset["time_bnds"] = time_bnds

I encountered error: ValueError: All arrays must be of the same length
from the line dset = pd.DataFrame(series).set_index(["time", "lat", "lon"]).to_xarray()

chengzhuzhang · 2023-05-31T17:44:21Z

I think the error from above codes is caused by improper series generated when there is a time dimension. Though I haven't figured out how to correct the problem. I adjust the the codes to only use one time snapshot, and it does appear that the xcdat spatial averaging operator can work with the data array converted with pandas to_xarray function, codes show below:

import pandas as pd
import xcdat as xc

filename = "/Users/zhang40/Documents/xcdat_test_e3sm/native/TREFHT.h0.2000.nc"
# Uses pandas.to_xarray() to handle possibly unstructured grids
variables = ["lat", "lon", "area", "LANDFRAC", "TREFHT"]
dset = xc.open_dataset(filename)[variables].isel(time=0)
series = {"lat": dset["lat"],
         "lon": dset["lon"],
         "TREFHT": dset["TREFHT"].values.flatten(),
         "area": dset["area"].values.flatten(),}

dset = pd.DataFrame(series).set_index(["lat", "lon"]).to_xarray()

# Compute global average
global_ave = dset.spatial.average("TREFHT", axis=["X", "Y"], weights=dset["area"])["TREFHT"]
print(global_ave)

# Compute average over the Tropics
dset_tropics = dset.sel(lat = slice(-20, 20))
dset_ave = dset_tropics.spatial.average("TREFHT", axis=["X", "Y"], weights=dset_tropics["area"])["TREFHT"]
print(dset_ave)

# Plot 
dset.TREFHT.plot()

tomvothecoder · 2023-06-08T16:43:10Z

I updated the title of this issue to reflect our proposed paths forward based on our discussion in the 6/7 meeting.

chengzhuzhang · 2024-02-27T01:07:57Z

I think I'm not too far away from producing a Jupyter Notebook based on the code snippet here. Is there a recommended place to include this type of auxiliary Jupyter Notebooks.

tomvothecoder · 2024-02-27T18:09:19Z

@chengzhuzhang Great! We can store them a separate notebooks directory at the root of the repo or under docs.

If we place them under docs, I'll update the sphinx config (if needed) to ignore that directory so warning messages aren't raised.

tomvothecoder · 2024-04-17T17:06:46Z

Hey @chengzhuzhang, have you found a solution that supports converting unstructured grids to rectilinear grids with more than one timestamp (original solution)?

I'm trying to figure out a workflow where we calculate the spatial average with all timestamps, which we can then pass to xCDAT's temporal APIs for averaging.

chengzhuzhang · 2024-04-17T18:34:23Z

@tomvothecoder I spent a couple hours trying to get time axis to work when coming up with the example, but no luck. And I"m not confident if I can have a solution with limited time. One possible alternative, can we test xCDAT's temporal API for averaging first and then show the example of spacial averaging with one time snapshot? This is not perfect but at least both aspects can be shown...

pochedls · 2024-04-17T18:54:12Z

What about this:

# required packages
import numpy as np
import xarray as xr
from scipy import spatial
import xcdat as xc

def simple_regrid(ds, target_var, nlat, nlon, infill=True):
    """
    simple_regrid(ds, target_var, nlat, nlon, infill=True)

    Nearest neighbor mapping of 2D fields from E3SM column format 
    to a user-defined rectilinear grid. 

    Parameters:
    -----------
    ds (xr.Dataset) : Source dataset to remap
    target_var (str) : Name of variable to remap
    nlat (xr.DataArray) : Target latitude values
    nlon (xr.DataArray) : Target longitude values
    infill (bool) : Flag to infill (with extrapolation) 
                    missing values (default True)

    Returns:
    --------
    xr.Dataset

    Notes:
    ------
    This regridding tool is intended as a simple regridding tool
    and is not fit for most scientific applications, but may be useful
    for data quick-looks.
    """
    dset_out = []
    # loop over time steps and remap one map at a time
    for i in range(len(ds.time)):
        # get data
        lat = ds.lat
        lon = ds.lon
        data = ds[target_var].isel(time=i)
        # target grid
        LON, LAT = np.meshgrid(nlon, nlat)
        shp = LAT.shape
        # Create a nearest-neighbor tree for the grid
        tree = spatial.cKDTree(list(zip(LAT.flat, LON.flat)))
        dis, ind = tree.query(np.array([lat, lon]).T)
        n = tree.n
        X = np.bincount(ind, weights=data,  minlength=n) # Sum of data in each grid box
        cnt = np.bincount(ind, weights=np.ones_like(data), minlength=n) # Total number of matches
        with np.errstate(divide='ignore', invalid='ignore'):
            grid = X / cnt # going to get divide by zero here for grid boxes with no data
        grid = grid.reshape(shp) # reshape to regular grid
        grid = xr.DataArray(data=grid,
                            dims=['lat', 'lon'],
                            coords={'lat': nlat, 'lon': nlon},
                            name=target_var)
        dset_out.append(grid)
    # concatenate time steps and create xr.Dataset
    dset_out = xr.concat(dset_out, dim=ds.time).to_dataset()
    # incorporate bounds from original dataset
    if 'time_bnds' in ds.data_vars:
        dset_out['time_bnds'] = ds.time_bnds
    # add missing bounds
    dset_out = dset_out.bounds.add_missing_bounds(["X", "Y", "T"])
    # infill (if desired)
    if infill:
        dset_out[target_var] = dset_out[target_var].interpolate_na(dim='lon', method='nearest', fill_value='extrapolate')
    return dset_out

Use:

# imports
import xcdat as xc

# open file
fn = 'TREFHT.h0.2000.nc'
ds = xc.open_dataset(fn)

# define regrid targets
target_var = 'TREFHT'
nlat, _ = xc.create_axis('lat', np.arange(-88.75, 90, 2.5), attrs={'axis': 'Y', 'units': 'degrees_north'}, generate_bounds=False)
nlon, _ = xc.create_axis('lon', np.arange(1.25, 360, 2.5), attrs={'axis': 'X', 'units': 'degrees_east'}, generate_bounds=False)

# call simple regridder
dsr = simple_regrid(ds, target_var, nlat, nlon)

# plot results
dsr['TREFHT'][0].plot()

tomvothecoder · 2024-04-18T16:56:09Z

Thanks @pochedls I incorporated your code above in the notebook and it seems to work as intended.

Maybe this this grip mapping functionality should be in UXarray rather than xCDAT?

tomvothecoder · 2024-04-18T16:57:19Z

Maybe this this grip mapping functionality should be in UXarray rather than xCDAT?

Actually, if users are using E3SM native data with xCDAT then it probably makes sense to have in xCDAT so they can use the spatial averager. We would need to generalize it across different models though.

pochedls · 2024-04-18T17:47:38Z

@tomvothecoder - I think this code likely gives a reasonable result for most applications, but I think it is too simplistic to include in xcdat (unless it is advertised as a "quick look" regridder or something (it would still need to be generalized). I wonder how easily ncremap (or a python version) could be included in xcdat. There is a lot to unpack. Something to explore...

chengzhuzhang · 2024-04-18T17:54:43Z

I think nco, with ncremap as sub tool, can be installed with conda. (https://nco.sourceforge.net)
For the tutorial, I think we don't need worry about installation, because we will most likely use the e3sm unified environment that covers xcdat, nco, uxarray and other packages potentially useful for xcdat talk.

Regarding to how to invoke ncremap in python script...here is an example I found: https://github.com/E3SM-Project/e3sm_to_cmip/blob/3b06eee2ab948ea23530c4be0392333ce266b41b/e3sm_to_cmip/mpas.py#L27

pochedls · 2024-04-18T17:57:26Z

@chengzhuzhang – I was commenting on the longer-term (perhaps a new feature in the future). It would be helpful to be able to go from column data to a grid within Python. I assume nco does this in the shell and outputs to a file (rather than keeping the data in memory). Or does nco have a Python interface to keep the data in-memory?

chengzhuzhang · 2024-04-18T18:02:27Z

nco operations are just subprocess and save intermediate files on disk. For the longer-term, we have advocated to support map to regular lat lon grids in uxarry, before that, your code here will be very useful, not sure if we want to convert as a function into xcdat source.

tomvothecoder mentioned this issue Aug 4, 2021

Add grid-aware geospatial averaging (weighted) #46

Closed

9 tasks

tomvothecoder added this to the v1.0.0 milestone Aug 4, 2021

tomvothecoder added Priority: Medium type: enhancement New enhancement request labels Aug 4, 2021

pochedls changed the title ~~Operates on E3SM native output (column-based data)~~ Spatial averaging operates on E3SM native output (column-based data) Aug 19, 2021

tomvothecoder removed this from the v1.0.0 milestone Jun 21, 2022

tomvothecoder changed the title ~~Spatial averaging operates on E3SM native output (column-based data)~~ [Feature]: Spatial averaging operates on E3SM native output (column-based data) Nov 9, 2022

tomvothecoder changed the title ~~[Feature]: Spatial averaging operates on E3SM native output (column-based data)~~ [Feature]: Consider either Jupyter Notebook tutorial or API for converting column-based data (e.g,. E3SM raw native) to grid-based data Jun 8, 2023

tomvothecoder added this to xCDAT Development Sep 27, 2023

github-project-automation bot moved this to Todo in xCDAT Development Sep 27, 2023

tomvothecoder added this to the FY24Q1 (v0.7.0) milestone Sep 27, 2023

tomvothecoder changed the title ~~[Feature]: Create a Jupyter Noteboo for converting column-based data (e.g,. E3SM raw native) to grid-based data to use xCDAT~~ [Feature]: Create a Jupyter Notebool for converting column-based data to grid-based data Sep 27, 2023

tomvothecoder changed the title ~~[Feature]: Create a Jupyter Notebool for converting column-based data to grid-based data~~ [Docs]: Create a Jupyter Notebook for converting column-based data to grid-based data Sep 27, 2023

tomvothecoder changed the title ~~[Docs]: Create a Jupyter Notebook for converting column-based data to grid-based data~~ [Doc]: Create a Jupyter Notebook for converting column-based data to grid-based data Sep 27, 2023

tomvothecoder added the type: docs Updates to documentation label Sep 27, 2023

tomvothecoder removed this from the FY24Q1 (v0.7.0) milestone Sep 27, 2023

tomvothecoder removed the Priority: Medium label Sep 27, 2023

tomvothecoder added this to the FY24Q2 (01/01/24 - 03/31/24) milestone Oct 11, 2023

tomvothecoder removed this from the FY24Q2 (01/01/24 - 03/31/24) milestone Dec 19, 2023

tomvothecoder modified the milestone: FY24Q3 (4/01/24 - 06/30/24) Feb 12, 2024

chengzhuzhang self-assigned this Feb 27, 2024

chengzhuzhang added this to the FY24Q3 (4/01/24 - 06/30/24) milestone Feb 27, 2024

tomvothecoder modified the milestones: FY24Q3 (4/01/24 - 06/30/24), FY24Q2 (01/01/24 - 03/31/24) Feb 28, 2024

tomvothecoder moved this from Todo to In Progress in xCDAT Development Feb 28, 2024

tomvothecoder modified the milestones: FY24Q2 (01/01/24 - 03/31/24), FY24Q3 (4/01/24 - 06/30/24) Apr 10, 2024

tomvothecoder added the good-first-issue Good first issue for new contributors label Jun 5, 2024

tomvothecoder moved this from In Progress to Todo in xCDAT Development Jun 20, 2024

tomvothecoder modified the milestones: FY24Q3 (4/01/24 - 06/30/24), FY24Q4 (07/01/24 - 09/30/24) Jun 20, 2024

tomvothecoder modified the milestone: FY24Q4 (07/01/24 - 09/30/24) Aug 28, 2024

tomvothecoder removed this from the FY24Q4 (07/01/24 - 09/30/24) milestone Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc]: Create a Jupyter Notebook for converting column-based data to grid-based data #89

[Doc]: Create a Jupyter Notebook for converting column-based data to grid-based data #89

tomvothecoder commented Aug 4, 2021 •

edited

Loading

tomvothecoder commented Aug 4, 2021

pochedls commented Aug 19, 2021

nocollier commented Apr 28, 2023 •

edited

Loading

tomvothecoder commented May 9, 2023

pochedls commented May 9, 2023

chengzhuzhang commented May 19, 2023

chengzhuzhang commented May 31, 2023 •

edited

Loading

tomvothecoder commented Jun 8, 2023

chengzhuzhang commented Feb 27, 2024

tomvothecoder commented Feb 27, 2024 •

edited

Loading

tomvothecoder commented Apr 17, 2024

chengzhuzhang commented Apr 17, 2024 •

edited

Loading

pochedls commented Apr 17, 2024

tomvothecoder commented Apr 18, 2024

tomvothecoder commented Apr 18, 2024 •

edited

Loading

pochedls commented Apr 18, 2024

chengzhuzhang commented Apr 18, 2024 •

edited

Loading

pochedls commented Apr 18, 2024

chengzhuzhang commented Apr 18, 2024

[Doc]: Create a Jupyter Notebook for converting column-based data to grid-based data #89

[Doc]: Create a Jupyter Notebook for converting column-based data to grid-based data #89

Comments

tomvothecoder commented Aug 4, 2021 • edited Loading

tomvothecoder commented Aug 4, 2021

pochedls commented Aug 19, 2021

nocollier commented Apr 28, 2023 • edited Loading

tomvothecoder commented May 9, 2023

pochedls commented May 9, 2023

chengzhuzhang commented May 19, 2023

chengzhuzhang commented May 31, 2023 • edited Loading

tomvothecoder commented Jun 8, 2023

chengzhuzhang commented Feb 27, 2024

tomvothecoder commented Feb 27, 2024 • edited Loading

tomvothecoder commented Apr 17, 2024

chengzhuzhang commented Apr 17, 2024 • edited Loading

pochedls commented Apr 17, 2024

tomvothecoder commented Apr 18, 2024

tomvothecoder commented Apr 18, 2024 • edited Loading

pochedls commented Apr 18, 2024

chengzhuzhang commented Apr 18, 2024 • edited Loading

pochedls commented Apr 18, 2024

chengzhuzhang commented Apr 18, 2024

tomvothecoder commented Aug 4, 2021 •

edited

Loading

nocollier commented Apr 28, 2023 •

edited

Loading

chengzhuzhang commented May 31, 2023 •

edited

Loading

tomvothecoder commented Feb 27, 2024 •

edited

Loading

chengzhuzhang commented Apr 17, 2024 •

edited

Loading

tomvothecoder commented Apr 18, 2024 •

edited

Loading

chengzhuzhang commented Apr 18, 2024 •

edited

Loading