Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(arrow): add GeoArrow utils for ArrowLoader #2738

Closed
wants to merge 1 commit into from

Conversation

lixun910
Copy link
Collaborator

@lixun910 lixun910 commented Oct 26, 2023

Add GeoArrow util functions:

  1. parsing geoarrow geometry column to binaryGeometry format https://deck.gl/docs/api-reference/layers/geojson-layer#using-binary-data
  2. parsing geoarrow geometry to GeoJson Feature

The geometry types supported: (see https://github.com/geoarrow/geoarrow/blob/main/extension-types.md#extension-names):

  • geoarrow.point
  • geoarrow.multipoint
  • geoarrow.linestring
  • geoarrow.multilinestring
  • geoarrow.polygon
  • geoarrow.multipolygon

These functions are converted from keplergl/kepler.gl#2385.

ToDos:

  • use ArrowLoader to replace tableFromIPC in test cases
  • add test cases for getBinaryGeometriesFromArrow
  • add test cases for getGeometryColumnsFromArrowTable
  • add test cases for getGeoArrowEncoding

Signed-off-by: Xun Li <lixun910@gmail.com>

// get geometry columns from arrow table
export function getGeometryColumnsFromArrowTable(arrowTable: Table): Map<string, {encoding: string}> {
const geometryColumns = new Map<string, any>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better typing than any?

Comment on lines +70 to +95
function updateBoundsFromGeoArrowSamples(
flatCoords: Float64Array,
nDim: number,
bounds: [number, number, number, number],
sampleSize: number = 100
): void {
const numberOfFeatures = flatCoords.length / nDim;
const sampleStep = Math.max(Math.floor(numberOfFeatures / sampleSize), 1);

for (let i = 0; i < numberOfFeatures; i += sampleStep) {
const lng = flatCoords[i * nDim];
const lat = flatCoords[i * nDim + 1];
if (lng < bounds[0]) {
bounds[0] = lng;
}
if (lat < bounds[1]) {
bounds[1] = lat;
}
if (lng > bounds[2]) {
bounds[2] = lng;
}
if (lat > bounds[3]) {
bounds[3] = lat;
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet but plan to create a repo like geoarrow-js to consolidate some operations like this. I implemented this "totalBounds" operation directly on an Arrow JS Vector here: https://github.com/geoarrow/deck.gl-layers/blob/8449634e9403f8159b987264fa35b1085affbbc0/src/geoarrow/bbox.ts

Comment on lines +124 to +127
const ringData = polygonData.children[0];
const pointData = ringData.children[0];
const coordData = pointData.children[0];
const nDim = pointData.stride;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that accessing .children directly didn't maintain typing information. In deck.gl-layers I have helpers to maintain strong typing for geometry types: https://github.com/geoarrow/deck.gl-layers/blob/8449634e9403f8159b987264fa35b1085affbbc0/src/utils.ts#L468-L471

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the typing for pointData.type.listSize? Should the strideForType() or pointData.stride take care of it? See https://github.com/apache/arrow/blob/f2995b0eb9e6792a2cca8ce9e8d9b933d4a9bbfd/js/src/type.ts#L701 Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use type.listSize myself. I haven't used stride

const coordData = pointData.children[0];
const nDim = pointData.stride;
const geomOffset = ringData.valueOffsets;
const flatCoordinateArray = coordData.values;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For arbitrary geoarrow input, you should assert that the low-level coords are interleaved, not separated (i.e. a struct of x and y columns) because geoarrow allows both in its spec

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! Will update. Thanks!

@ibgreen
Copy link
Collaborator

ibgreen commented Oct 30, 2023

This was closed by #2744. Unfortunately, I suspect some of the comments provided on this PR were not addressed there.

@ibgreen ibgreen closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants