-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PR]: Add Parallel Computing with Dask Jupyter Notebook #489
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #489 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 15 15
Lines 1542 1542
=========================================
Hits 1542 1542 ☔ View full report in Codecov by Sentry. |
Hi @xCDAT/core-developers, I need some help developing test cases for this Dask notebook. Do any of you have any datasets with the following sizes in mind, and can you provide me a link to them on ESGF and locally on the Climate filesystem?
I will be comparing the sequential and parallel performance of xCDAT's spatial and temporal APIs against CDAT. |
|
UPDATE: Nevermind, I opened up the dataset using
import os
import xarray as xr
import xcdat as xc
# 1. Set directory and get the absolute filepaths for each dataset
dir = (
"/p/css03/esgf_publish/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r2i1p1f3"
"/day/ta/gn/v20191218"
)
filepaths = []
for root, dirs, files in os.walk(os.path.abspath(dir)):
for file in files:
filepaths.append(os.path.join(root, file))
# 2. Attempt to open all of the files with auto chunking -- Breaks
# NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data
ds = xr.open_mfdataset(f"{dir}/*.nc", chunks="auto")
# 3. Check the dataset dtypes
ds = xr.open_dataset(filepaths[0])
print(ds.coords.dtypes)
# Frozen({'time': dtype('O'), 'plev': dtype('float64'), 'lat': dtype('float64'), 'lon': dtype('float64')}) |
Some notes about Xarray loading
|
Notes from 5/24/23 meeting:
|
3ebf0eb
to
ddb509d
Compare
f01f3aa
to
78cb7a5
Compare
Hey @jasonb5 and @pochedls, this parallel computing notebook is ready for review. I tried to make the notebook easy to read for users with simple code examples. If we want more advanced code examples for things like horizontal regridding, we can either update this notebook or update other notebooks with Dask usage. @jasonb5 can you make sure everything is accurate? And @pochedls can you review how well the notebook flows and if it is understandable. I'm also open to suggestions. Here's the link: https://xcdat.readthedocs.io/en/doc-485-dask-guide/examples/parallel-computing-with-dask.html |
Todo (3/28/24)
|
@tomvothecoder – as you work on this – I did skim this notebook, which looks like an amazing resource. Some things that I was wondering about:
|
cf4da48
to
0f11a32
Compare
@tomvothecoder The notebook looks great, lots of good information and resources. I ran into something on Nimbus that could occur on other similar environments where Not sure if this should be addressed in the notebook but maybe the value should be set to a hard minimum requirement for the example e.g. |
Thanks for the review @jasonb5! I'll add your suggestion and make a note about the |
416322f
to
3fb1e6f
Compare
Hey @pochedls, the notebook is ready for review. Here's a link to the current version of the notebook. Summary of changes:
RE: Your comments from above:
|
fb923f2
to
0d61eb5
Compare
I like your idea of having a companion notebook to showcase real-world examples of parallelizing xCDAT analysis code. You can open another issue for this. I should have emphasized that the current notebook is intended to introduce scientists to the basic concepts of Dask while connecting it to Xarray and xCDAT (at a very high-level). There are a ton of great resources that are cited throughout notebook and I was hoping to consolidate the most important info in this notebook. Addressed your list of comments below:
|
a2d38f3
to
d582c00
Compare
Need to address #662 before this PR can be merged |
f3720b5
to
1302acb
Compare
Description
sphinx-autosummary-accessors
in conda env yml filesChecklist
If applicable: