-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing grid tile generation #35
Comments
@joshuacortez |
@joshuacortez For the workflow you mention, I have a few questions.
|
Ahh are you referring to the risk of having square cells whose corners don't align with each other? Good point. This should be addressed by having a consistently defined overall bounding box + resolution (e.g. bounding box of the entire country with x-meter resolution cells). With an x,y coordinate system being based on one overall bounding box + resolution, each polygon in the for-loop in the original post should be referring to the same set of cells (i.e. they should be aligned by default).
Oh what did you mean by this? I imagine the x_idx/y_idx of each cell is based on the overall bounding box + resolution. I.e. cell (0,0) is in one corner of the country bounding box. Let me know if I understood it correctly. The questions are very related to #33 since the overall bounding box can optionally be set for coordinate system consistency across different AOIs. |
Ah, x_idx and y_idx are the columns and row numbers in reference to the origin. It is unclear to me how we can derive these based on the bounding box without generating all the grids of a country. |
@jtmiclat yep! There might be different ways of implementing it. One way is to have a lookup table for the x-axis and a lookup table for the y-axis. Each lookup table can be a pandas series. The indices are the column numbers and the row numbers, and the corresponding values are the reprojected coordinates (meter-based). The lookup tables are a "compressed" representation of representing the SquareGrid since you don't have to take the cartesian product of the two axes. E.g. IDN x-axis lookup table in reference to country bounding box using epsg:23845 with 100m spacing
With these lookup tables, it's easy to get cells with respect to a bounding box since it only relies on checking which coordinates are within the xmin, xmax, ymin, ymax of the bounding box. Might also be a good idea to pad the borders for good measure. (i.e. xmax = xmax + spacing, xmin = xmin - spacing, etc.) Just not sure how this would look like for H3Grid or S2Grid though. I also acknowledge we're introducing new objects like the 2 lookup tables here but might be a good tradeoff in exchange for a lot of efficiency. Let me know what you think! |
I see, I kinda get what #33 is. We have a Class or Datatype we can pass that would make it easier to reference Some pseudo-code
might replace geowrangler/geowrangler/grids.py Lines 70 to 74 in 86561e6
And a possible usage within SquareGridGenerator
Regarding H3Grid and S2Grid, we don't need to worry about them as they are designed to have a unique index in the context of the entire world. We should only be facing this issue when using a custom index or grid. |
I see, the I was just thinking if In my head, the for-loop pseudo code above could look like the for-loop here for multiple polygons
This version doesn't save the actual coordinates though (just the indices) but yeah |
I think there’s room for further optimization especially for
generate_grids
in theGridGenerator
class. Right now the grid tiles are first generated across the entire span ofxrange
andyrange
and then filtered out after. While this isn’t an issue for very coarse grids, this can easily run into runtime and memory issues for fine grids.Instead of generating all the tiles and then filtering after, we can generate only the grid tiles we need.
To determine which grid tiles to generate in the first place, we can use the cheapest possible geometric operations.
xrange
andyrange
and intersecting tiles with the gdf’sunary_union
can be expensive since theunary_union
is a single geometry most likely has a large number of points.Can make a PR for this too!
The text was updated successfully, but these errors were encountered: