Can't download data when runing .sh file #6

ghost · 2023-05-22T00:41:19Z

Hello Mr.Andersson:

  Could you please help me to figure out why this problem happens  while I run the icenet project code,thank you.

  when I try to run the following command.

./download_era5_data_in_parallel.sh

./download_cmip6_data_in_parallel.sh

./rotate_wind_data_in_parallel.sh

./download_seas5_forecasts_in_parallel.sh

 The first three .sh files just run in 1 second and the last download_seas5_forecasts_in_parallel.sh runs like that

I am still waiting for ECMWF to approve my access data application,but I already set up a CDS account and populate .cdsapirc file according to the guidance.
it seems that
./download_era5_data_in_parallel.sh

./download_cmip6_data_in_parallel.sh doesn't download anything.

I am confused why these two .sh files do not work.

Then when I execute python3 icenet/biascorrect_seas5_forecasts.py

I know because I haven't got access to ECMWF data,but it seems that I couldn't download era5 and cmip6 data either.

gen_masks.py/download_sic_data.py can work and download data correctly

Thank you for your help

The text was updated successfully, but these errors were encountered:

tom-andersson · 2023-05-23T08:54:20Z

@bryandunn614, the ./download_era5_data_in_parallel.sh runs the download processes in the background (note the & commands at the end of each line in the script). They should be running, which you can sanity check with ps and top in the command line (assuming you are now running in the Windows Subsystem for Linux). Note that this outputs logs - check for example cat logs/era5_download_logs/tas.txt. What do you see? Has the ERA5 data downloaded to the data/obs/ folder?

Are you sure you need the CMIP6 data for your project? It is very large and doesn't lead to a substantial performance boost (see our paper).

Good luck with the SEAS5 access request. You can also use the SEAS5 performance metrics that we computed by downloading the paper generated data. This would save you a lot of time.

tom-andersson · 2023-05-23T11:58:26Z

FYI I have updated the README to make it more clear that the parallel bash scripts run python processes in the background, and state the log file folder paths: 4f5fb27

ghost · 2023-06-24T16:17:33Z

Hello @tom-andersson and @JimCircadian :
Thank you for your reply,I am busy with my final last couple of weeks.Now start working on this project.However there are two issues with data downloading which I can't solve now.
When running the command ./rotate_wind_data_in_parallel.sh,all log file report error that there is a missing file uas_EASE_cmpr.nc,just like below:

Rotating wind data in data/cmip6/EC-Earth3/r2i1p1f1
Traceback (most recent call last):
File "icenet/rotate_wind_data.py", line 91, in
wind_cubes[var] = iris.load_cube(EASE_path)
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/init.py", line 387, in load_cube
cubes = _load_collection(uris, constraints, callback).cubes()
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/init.py", line 325, in _load_collection
result = iris.cube._CubeFilterCollection.from_cubes(cubes, constraints)
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/cube.py", line 157, in from_cubes
for cube in cubes:
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/init.py", line 312, in _generate_cubes
for cube in iris.io.load_files(part_names, callback, constraints):
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/io/init.py", line 193, in load_files
all_file_paths = expand_filespecs(filenames)
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/io/init.py", line 176, in expand_filespecs
raise IOError(msg)
OSError: One or more of the files specified did not exist:
"/mnt/e/icenet-paper/data/cmip6/EC-Earth3/r2i1p1f1/uas_EASE_cmpr.nc" didn't match any files

I have checked all data/cmip6 subfiles every folder just contains a single siconca_latlon.nc file,no uas_EASE_cmpr.nc is generated

I think because download_cmip6_data.py doesn't generate or download uas_EASE_cmpr.nc correctly.All I modify to the project code is just comment out line 310 of download_cmip6_data.py(query['data_node'] = data_node),referring to another issue raised by @xinaesthete

Thank you for your help

ghost · 2023-07-11T10:34:17Z

nco should be installed to run the download_seas5_forecasts.py successfully
When I run the download_seas5_forecasts.py it keeps reporting that:
Regridding to EASE... /bin/sh: 1: ncatted: not found
ut_scale(): NULL factor argument
ut_are_convertible(): NULL unit argument

I tried to find what's wrong with the code,but everything seems to be fine.So I just search on chatgpt
conda install -c conda-forge nco
should be installed on icenet environment to successfully run the command
but not installed before

tom-andersson · 2023-07-11T13:45:21Z

Hi @bryandunn614.

When running the command ./rotate_wind_data_in_parallel.sh,all log file report error that there is a missing file uas_EASE_cmpr.nc,just like below:

The issue with downloading MRI-ESM2.0 data not returning files has now been fixed (#4). Can you try git pulling and rerunning the MRI-ESM2.0 download (assuming your laptop has space) and check that the log files show uas/vas wind data downloading correctly? You can re-open if you find a bug in the code.

tom-andersson · 2023-07-11T15:08:32Z

Regarding the download_seas5_forecasts.py issue you mentioned, these are just warnings about missing metadata in the NetCDF file and can be ignored.

I ran python3 icenet/download_seas5_forecasts.py --leadtime 1 locally on my laptop and I was able to download the data to latlon/ successfully but then the process was killed during regridding because my laptop doesn't have enough memory. I then manually reduced the size of seas5_leadtime1_latlon.nc by selecting just a year of data and confirmed that the regridding worked.

I did look into using dask to regrid the SEAS5 forecasts in chunks and avoid memory issues, but it looks like iris doesn't support regridding with dask: https://scitools-iris.readthedocs.io/en/latest/userguide/real_and_lazy_data.html#:~:text=lazy%20evaluation.-,Certain%20operations,-%2C%20including%20regridding%20and

There will be a workaround to avoid a large memory footprint during regridding (e.g. by looping over the data in years) but I don't have time to implement this unfortunately. Your best bet will be to move to a HPC where memory is not a constraint.

ghost · 2023-07-16T02:28:54Z

Hello @tom-andersson :
I don't think this work on my end,I run the download_seas5_forecasts.sh on my cloud server,it seems the procedure get stuck in the regridding and finally with memory of 190GB filled up and crashes.Herer is the message I get:

Regridding to EASE... ut_scale(): NULL factor argument
ut_are_convertible(): NULL unit argument


nqstat_anu 90554322
                                %CPU  WallTime  Time Lim     RSS    mem memlim cpus
 90554322 R cd8380 zv32  job2.sh  74  01:04:49  10:00:00  176GB  189GB  190GB     1

I have checked the disk space,it finished downloading and did not generate anything on the disk space
but still running

I don't think it is normal that single ./download_seas5_forecasts_in_parallel.sh can take up 190GB memory space

tom-andersson · 2023-07-16T16:13:11Z

Hi @bryandunn614

190 GB sounds correct given what is happening computationally. For each of the 6 lead times, the SEAS5 lat/lon forecast is 1440x360x240x25 which is roughly 13 GB when loaded uncompressed into memory. I ran python3 icenet/download_seas5_forecasts.py --leadtime 1 on my HPC and monitored the memory usage and it went up to around 80 GB. So if running all 1-6 monthly lead times in parallel with the bash script, and if multiple downloads complete at the same time leading to multiple regriddings happening in parallel, you could easily end up using 100s of GB of memory.

You can modify the SEAS5 parallel download script to run the commands in sequence rather than in parallel by changing the & to &&.

ghost · 2023-07-16T16:27:51Z

Thank you @tom-andersson for your reply,
another issue is also related with data download,I submit the download_cmip6_data_in_parallel.sh to the cloud server to run.
But none of them download the complete dataset,EC_r14i1p1f1 and EC_r12i1p1f1 even report the error,the error messages are same which is:

`siconca: searching ESGF... found historical, found ssp245, found 251 files. loading metadata... downloading with xarray... saving to regrid in iris... regridding... done in 0.0m:17s... compressing & saving... done in 2.0m:56s... Done.

tas: searching ESGF... found historical, found ssp245, found 251 files. loading metadata... downloading with xarray... saving to regrid in iris... regridding... done in 0.0m:18s... compressing & saving... done in 1.0m:8s... Done.


ta: searching ESGF... found 0 files. 500.0 hPa, loading metadata... Traceback (most recent call last):
  File "icenet/download_cmip6_data.py", line 334, in <module>
    cmip6_da = xr.open_mfdataset(results, combine='by_coords', chunks={'time': '499MB'})[variable_id]
  File "/**_(my clodd server path)_**/mambaforge/envs/icenet/lib/python3.7/site-packages/xarray/backends/api.py", line 921, in open_mfdataset
    raise OSError("no files to open")
OSError: no files to open
`

I am wondering whether it is caused by run 5 programs parallelly(I only run 5 EC-Earth3 download commands) caused running out of memory(190GB memory ) which I checked is not this kind of situation or this is because r14i1p1f1 and r12i1p1f1 should be changed to download data successfully.This program just used up my single node usage time (10 hours) and did not report failed.The download speed should be fast,because I just connect to the supercomputer server node based in Australia to run the command.

Thank your for help!

tom-andersson · 2023-07-17T08:55:17Z

Hi @bryandunn614, I can't reproduce your error:

> python3 icenet/download_cmip6_data.py  --source_id EC-Earth3 --member_id r14i1p1f1

Downloading data for EC-Earth3, r14i1p1f1
...
ta: searching ESGF... found historical, found ssp245, found 251 files. 500.0 hPa, loading metadata... downloading with xarray...

It's possible there was a connection issue during the download, either from the ESGF data node or your HPC. Did you try running the download for the missing variables again? icenet/download_cmip6_data.py doesn't let you select specific variables but you can easily edit it to do this.

ghost · 2023-07-17T18:51:31Z

Hello @tom-andersson I don't think this issue can be ignored

download_seas5_forecasts.py issue you mentioned, these are just warnings about missing metadata in the NetCDF file and can be ignored.

If I just ignored the issue I will have a error when runnin the biascorrect_seas5_forecasts.py command:
Traceback (most recent call last): File "icenet/biascorrect_seas5_forecasts.py", line 63, in <module> [da.mean('number') for da in seas5_forecast_da_list], 'leadtime') File "/scratch/zv32/cd8380/mambaforge/envs/icenet/lib/python3.7/site-packages/xarray/core/concat.py", line 174, in concat raise ValueError("must supply at least one object to concatenate") ValueError: must supply at least one object to concatenate

I have successfully run the above commands,but ran into this problem.

tom-andersson closed this as completed Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't download data when runing .sh file #6

Can't download data when runing .sh file #6

ghost commented May 22, 2023

tom-andersson commented May 23, 2023

tom-andersson commented May 23, 2023

ghost commented Jun 24, 2023 •

edited by ghost

Loading

ghost commented Jul 11, 2023

tom-andersson commented Jul 11, 2023

tom-andersson commented Jul 11, 2023

ghost commented Jul 16, 2023

tom-andersson commented Jul 16, 2023

ghost commented Jul 16, 2023 •

edited by ghost

Loading

tom-andersson commented Jul 17, 2023

ghost commented Jul 17, 2023

Can't download data when runing .sh file #6

Can't download data when runing .sh file #6

Comments

ghost commented May 22, 2023

tom-andersson commented May 23, 2023

tom-andersson commented May 23, 2023

ghost commented Jun 24, 2023 • edited by ghost Loading

ghost commented Jul 11, 2023

tom-andersson commented Jul 11, 2023

tom-andersson commented Jul 11, 2023

ghost commented Jul 16, 2023

tom-andersson commented Jul 16, 2023

ghost commented Jul 16, 2023 • edited by ghost Loading

tom-andersson commented Jul 17, 2023

ghost commented Jul 17, 2023

ghost commented Jun 24, 2023 •

edited by ghost

Loading

ghost commented Jul 16, 2023 •

edited by ghost

Loading