-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc]: Create a Jupyter Notebook for converting column-based data to grid-based data #89
Comments
Hey @pochedls, can you fill out the description surrounding this issue? |
Description: One general goal of xcdat is to operate on CMIP-like output data, but also on native E3SM output. E3SM output is column-based (e.g., time x column) and is not natively saved in a latitude x longitude matrix. To support spatial averaging on E3SM's native grid, we would need to assess and develop functionality to do this (e.g., estimate column boundaries, generate weights, and refactor code to handle averaging of column-based arrays). |
Just FYI, you can take column data from the raw E3SM h0/h1 files, and build a pandas dataframe where you set the index equal to 'time', 'lat', and 'lon'. Then pandas has a See code in this vicinity: |
Thank you @nocollier! It is helpful to know that we can prepare an I'll check in with @pochedls to see if this issue should still be open. |
Thanks @nocollier! @tomvothecoder – maybe we could have @chengzhuzhang or someone else on E3SM test this out to see if this approach works with xcdat functions? |
@nocollier thank you for bringing this approach to our attention. The team is excited about your method. I'm trying to adjust your codes to test on E3SM native atmosphere data, but came across an error I can't resolve. If at all possible would you please help look at the code and spot any obvious issue? Thanks a lot! import pandas as pd
import xarray as xr
# E3SM native monthly atmospheric output example data available at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/zhang40/TREFHT.h0.2000.nc
filename = "/Users/zhang40/Documents/xcdat_test_e3sm/native/TREFHT.h0.2000.nc"
# Uses pandas.to_xarray() to handle possibly unstructured grids
used = ["time_bnds", "lat", "lon", "area", "LANDFRAC", "TREFHT"]
dset = xr.open_dataset(filename)[used]
# Comment out below, data already concatenated.
# dset = xr.concat(dset, dim="time")
rem_attrs = {v: dset[v].attrs for v in used}
time_bnds = dset[used.pop(0)]
series = {"time": pd.Series(dset["time"]).repeat(dset.dims["ncol"])}
series.update({v: dset[v].values.flatten() for v in used})
dset = pd.DataFrame(series).set_index(["time", "lat", "lon"]).to_xarray()
dset["time_bnds"] = time_bnds I encountered error: |
I think the error from above codes is caused by improper series generated when there is a time dimension. Though I haven't figured out how to correct the problem. I adjust the the codes to only use one time snapshot, and it does appear that the xcdat spatial averaging operator can work with the data array converted with pandas import pandas as pd
import xcdat as xc
filename = "/Users/zhang40/Documents/xcdat_test_e3sm/native/TREFHT.h0.2000.nc"
# Uses pandas.to_xarray() to handle possibly unstructured grids
variables = ["lat", "lon", "area", "LANDFRAC", "TREFHT"]
dset = xc.open_dataset(filename)[variables].isel(time=0)
series = {"lat": dset["lat"],
"lon": dset["lon"],
"TREFHT": dset["TREFHT"].values.flatten(),
"area": dset["area"].values.flatten(),}
dset = pd.DataFrame(series).set_index(["lat", "lon"]).to_xarray()
# Compute global average
global_ave = dset.spatial.average("TREFHT", axis=["X", "Y"], weights=dset["area"])["TREFHT"]
print(global_ave)
# Compute average over the Tropics
dset_tropics = dset.sel(lat = slice(-20, 20))
dset_ave = dset_tropics.spatial.average("TREFHT", axis=["X", "Y"], weights=dset_tropics["area"])["TREFHT"]
print(dset_ave)
# Plot
dset.TREFHT.plot() |
I updated the title of this issue to reflect our proposed paths forward based on our discussion in the 6/7 meeting. |
I think I'm not too far away from producing a Jupyter Notebook based on the code snippet here. Is there a recommended place to include this type of auxiliary Jupyter Notebooks. |
@chengzhuzhang Great! We can store them a separate If we place them under |
Hey @chengzhuzhang, have you found a solution that supports converting unstructured grids to rectilinear grids with more than one timestamp (original solution)? I'm trying to figure out a workflow where we calculate the spatial average with all timestamps, which we can then pass to xCDAT's temporal APIs for averaging. |
@tomvothecoder I spent a couple hours trying to get time axis to work when coming up with the example, but no luck. And I"m not confident if I can have a solution with limited time. One possible alternative, can we test xCDAT's temporal API for averaging first and then show the example of spacial averaging with one time snapshot? This is not perfect but at least both aspects can be shown... |
What about this: # required packages
import numpy as np
import xarray as xr
from scipy import spatial
import xcdat as xc
def simple_regrid(ds, target_var, nlat, nlon, infill=True):
"""
simple_regrid(ds, target_var, nlat, nlon, infill=True)
Nearest neighbor mapping of 2D fields from E3SM column format
to a user-defined rectilinear grid.
Parameters:
-----------
ds (xr.Dataset) : Source dataset to remap
target_var (str) : Name of variable to remap
nlat (xr.DataArray) : Target latitude values
nlon (xr.DataArray) : Target longitude values
infill (bool) : Flag to infill (with extrapolation)
missing values (default True)
Returns:
--------
xr.Dataset
Notes:
------
This regridding tool is intended as a simple regridding tool
and is not fit for most scientific applications, but may be useful
for data quick-looks.
"""
dset_out = []
# loop over time steps and remap one map at a time
for i in range(len(ds.time)):
# get data
lat = ds.lat
lon = ds.lon
data = ds[target_var].isel(time=i)
# target grid
LON, LAT = np.meshgrid(nlon, nlat)
shp = LAT.shape
# Create a nearest-neighbor tree for the grid
tree = spatial.cKDTree(list(zip(LAT.flat, LON.flat)))
dis, ind = tree.query(np.array([lat, lon]).T)
n = tree.n
X = np.bincount(ind, weights=data, minlength=n) # Sum of data in each grid box
cnt = np.bincount(ind, weights=np.ones_like(data), minlength=n) # Total number of matches
with np.errstate(divide='ignore', invalid='ignore'):
grid = X / cnt # going to get divide by zero here for grid boxes with no data
grid = grid.reshape(shp) # reshape to regular grid
grid = xr.DataArray(data=grid,
dims=['lat', 'lon'],
coords={'lat': nlat, 'lon': nlon},
name=target_var)
dset_out.append(grid)
# concatenate time steps and create xr.Dataset
dset_out = xr.concat(dset_out, dim=ds.time).to_dataset()
# incorporate bounds from original dataset
if 'time_bnds' in ds.data_vars:
dset_out['time_bnds'] = ds.time_bnds
# add missing bounds
dset_out = dset_out.bounds.add_missing_bounds(["X", "Y", "T"])
# infill (if desired)
if infill:
dset_out[target_var] = dset_out[target_var].interpolate_na(dim='lon', method='nearest', fill_value='extrapolate')
return dset_out Use: # imports
import xcdat as xc
# open file
fn = 'TREFHT.h0.2000.nc'
ds = xc.open_dataset(fn)
# define regrid targets
target_var = 'TREFHT'
nlat, _ = xc.create_axis('lat', np.arange(-88.75, 90, 2.5), attrs={'axis': 'Y', 'units': 'degrees_north'}, generate_bounds=False)
nlon, _ = xc.create_axis('lon', np.arange(1.25, 360, 2.5), attrs={'axis': 'X', 'units': 'degrees_east'}, generate_bounds=False)
# call simple regridder
dsr = simple_regrid(ds, target_var, nlat, nlon)
# plot results
dsr['TREFHT'][0].plot() |
Actually, if users are using E3SM native data with xCDAT then it probably makes sense to have in xCDAT so they can use the spatial averager. We would need to generalize it across different models though. |
@tomvothecoder - I think this code likely gives a reasonable result for most applications, but I think it is too simplistic to include in xcdat (unless it is advertised as a "quick look" regridder or something (it would still need to be generalized). I wonder how easily |
I think Regarding to how to invoke |
@chengzhuzhang – I was commenting on the longer-term (perhaps a new feature in the future). It would be helpful to be able to go from column data to a grid within Python. I assume nco does this in the shell and outputs to a file (rather than keeping the data in memory). Or does nco have a Python interface to keep the data in-memory? |
nco operations are just subprocess and save intermediate files on disk. For the longer-term, we have advocated to support map to regular lat lon grids in uxarry, before that, your code here will be very useful, not sure if we want to convert as a function into xcdat source. |
Create a Jupyter Notebook tutorial that covers how to convert column-based data to grid-based data for use with xCDAT APIs.
The text was updated successfully, but these errors were encountered: