Skip to content

Comments

Breakout user-guide, add num_sample param option to natural_breaks(), fix slow import#123

Merged
brendancol merged 8 commits intoxarray-contrib:masterfrom
thuydotm:master
Aug 24, 2020
Merged

Breakout user-guide, add num_sample param option to natural_breaks(), fix slow import#123
brendancol merged 8 commits intoxarray-contrib:masterfrom
thuydotm:master

Conversation

@thuydotm
Copy link
Contributor

No description provided.

@thuydotm thuydotm changed the title Breakout user-guide, add num_sample param option to natural_breaks() Breakout user-guide, add num_sample param option to natural_breaks(), fix slow import Aug 20, 2020
@brendancol
Copy link
Contributor

brendancol commented Aug 21, 2020

@thuydotm Looks like there is a cache parameter which may help on import to get the best of both worlds:

image

Also, I'm happy merging this and then we can experiment with cache parameter separately

@thuydotm
Copy link
Contributor Author

@brendancol interesting, as I investigated from Numba documentation, we can set cache=True to avoid compilation times each time invoking a Python program. Numba does this by writing the result of function compilation into a file-based cache. Thus I think this can be useful from the second import (e.g: after restarting the Kernel). I'm trying it out to see the actual result.

@thuydotm
Copy link
Contributor Author

Here I tried to import the package and run viewshed for experiments

%time import xrspatial
%time viewshed(normal_agg, OBSERVER_X, OBSERVER_Y)
  1. Eager compilation with cache=True
# first run
import time:           16 s
viewshed run time: 3.11 s
# 2nd run after restarting the kernel
import time:           5.07 s
viewshed run time: 2.94 s
  1. Lazy compilation with cache=True
# first run
import time:           4.24 s
viewshed run time: 18.2 s
# 2nd run after restarting the kernel
import time:           4.2 s
viewshed run time: 2.94 s

By setting cache=True, the second run of the lazily compiled function is improved significantly but it will consume some memory also.
@brendancol How do you think?

@jbednar
Copy link
Contributor

jbednar commented Aug 21, 2020

I thought that Numba caching is persisted indefinitely, i.e. unrelated to any kernel restarts, and not using memory (just disk). As I understood it, it would be cached the first time you import that code on a given system, then it would be reused. We tried enabling it globally within Datashader but had various issues, so it's great if it can be enabled in xarray-spatial and spatialpandas instead (which is where Datashader's import slowdowns were coming from at the time).

@thuydotm
Copy link
Contributor Author

thuydotm commented Aug 21, 2020

@jbednar Thanks for pointing out. I've seen a setting for numba cache, NUMBA_CACHE_DIR, to override the location of the cache directory if we want. Also, I agree that it is unrelated to any kernel restarts. Restarting kernel here is just to have a second run, we can definitely reuse it by different ways. Can you share the issues you had when trying to enable it within Datashader?

@jbednar
Copy link
Contributor

jbednar commented Aug 21, 2020

Datashader has a convenience utility ngjit = nb.jit(nopython=True, nogil=True) that is used all over the place, and when I added cache=True there, I found all sorts of errors. I don't recall whether that was immediately on import, or when running the tests. I then tried adding it locally just in a few places, and couldn't figure out which ones worked well and which didn't. But Datashader's doing a lot of crazy stuff internally (pasting strings to make code dynamically), and you probably won't have the same issues here. At least I hope you don't!

@brendancol
Copy link
Contributor

@thuydotm Lazy compilation with cache=True sounds perfectly fine.
@jbednar thanks for the insights. I think we're safe to deprecate viewshed from datashader. I opened an issue here and will follow up with a PR holoviz/datashader#953

@thuydotm
Copy link
Contributor Author

We have almost all tests passed when setting cache=True, except for the focal tests. I've been working around but not yet figured out why and how to resolve it. How about having cache=False as defaulted in focal functions and having it True anywhere else?

@brendancol brendancol merged commit cb46cec into xarray-contrib:master Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants