-
Notifications
You must be signed in to change notification settings - Fork 13
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector Zonal Stats: Colab crashing due to RAM consumption #71
Comments
Hi @alronlam |
Hi @alronlam, the ookla dataset 'indonesia-ookla-2020-q1-fixed.csv' is missing as well. |
Oh hi @butchtm , the link to the GDrive folder for these files are in the top-most part of the notebook. |
Additional detail: I think I also ran into this RAM issue when aligning with the raw Ookla dataset: https://registry.opendata.aws/speedtest-global-performance/ I tried utilizing the latest fixed line data from Ookla:
My workaround was to utilize an older, filtered version of the data that was for Indonesia only (because this raw data was for the whole world). So I guess one principle here is that we should always filter the feature datasets as much as we can before aligning to the AOIs to avoid such issues. But in the example of HRSL, this data is already for Indonesia alone. Not sure what else we can do to make it work for such big datasets (some kind of parallel processing?). Or in these cases, are we forced to use other tools like BQ? |
hi @alronlam, I'm trying to see if I can just convert the HRSL data (1.8GB csv file) to a geojson file and load it as such, but even that is already crashing Colab. Colab might not be ideal for working with production sized datasets but for learning/exploring the modules. |
low prio; for further discussions |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Colab notebook for testing:
https://colab.research.google.com/drive/147HWUgaBztsZuBPrI_HTckBrz_vl9l1l#scrollTo=wvLenjgDUgod
Scenario
Error
Colab crashes due to exceeding the RAM limit.
Just creating this issue to check if there are straightforward ways to optimize. Otherwise, are there workarounds for handling such vector datasets that are relatively large?
The text was updated successfully, but these errors were encountered: