Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Switch geodata providers #7393

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

camdecoster
Copy link
Contributor

@camdecoster camdecoster commented Mar 20, 2025

Description

Adds scripts to build topojson from UN sourced data and updates build process to run these scripts.

Closes #7334

Changes

  • Adds script for downloading shapefiles
  • Adds script for converting shapefiles into topojson
  • Updates topojson path references

Testing

  • Run npm run build_topojson and make sure the script completes successfully
  • Try loading the maps in ./dist/topojson in Mapshaper
  • Open the test dashboard
  • Try loading some of the geo plots and look for errors

Notes

  • The new maps are built from this UN data source for countries, coastlines, and lands layers. The oceans, lakes, rivers, and subunits layers are built from Natural Earth Data.
  • The current maps come from sane-topojson, which is entirely derived from Natural Earth Data
  • I haven't been able to verify the resolution of the data from the UN. There's a reference here to the "UNGeodata 25 million scale", but I haven't been able to confirm what that means. As such, the labels '50m' and '110m' probably aren't accurate. But they need to remain as they are to not break references.
  • Mapshaper can't handle antimeridian cutting, so features that cross the antimeridian can look weird (source)
  • I want to credit sane-topojson for some of the design of these scripts. I'm attempting to create a (mostly) drop in replacement, so it made sense to follow those conventions.

TODO

  • Clean up shapefiles after converting to topojson
  • Fix Antarctica maps (or just remove them)
  • Commit new maps to ./dist/topojson folder
  • Fix broken tests (I'm sure there will be some)
  • Remove log statements in process_geodata.mjs (or hide them behind flag)
  • Add the 'usa' map (though it will be pretty bland without states)
  • Remove sane-topojson package
  • Verify resolutions
  • Only include actual coastlines in coastlines
  • Make sure 110m maps don't have any unintended artifacts (like South America)
  • Add UN geodata archive to repo and only attempt download if file can't be found
  • Add markdown log file in draftlogs: 7393_feat.md
  • Remove unnecessary properties info from final maps

@gvwilson gvwilson added feature something new P1 needed for current cycle labels Mar 20, 2025
@camdecoster camdecoster changed the title Switch geodata providers feat: Switch geodata providers Mar 20, 2025
import config from './config.json' assert { type: 'json' };

try {
// Download data from UN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As these origin URLs may change (link rot), I worry that this piece of the script could become out of date quickly. I'm not sure there's too much we can do here but I might recommend committing the current source file (but not adding it to the dist/. Maybe someone else has a better idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable. The script could be updated to look for the file first and only download if it doesn't exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marthacryan do you have an opinion or ideas on this? What's the ideal dev UX here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in 3b47c67 I added the archive from the UN.

@ndrezn ndrezn requested a review from emilykl March 20, 2025 17:03
@gvwilson
Copy link
Contributor

  1. Sync with this branch.
  2. npm i to install/update packages.
  3. npm run build_topojson to build JSON files.
  4. Go to mapshaper.org.
  5. Load europe_50m.json.
  6. Initially looks like the lakes are misplaced (for example, the Caspian Sea seems to overlap the Russian border in a weird way in the screenshot).
  7. On closer inspection, maybe this is valid: I'm unable to find a map showing how national borders cross the Caspian Sea, and Russia might very well have a claim to the northwest section.
  8. The three lakes in the north check out: one is in Sweden, the other two are near St. Petersburg.
    Screenshot 2025-03-25 at 9 58 13 AM

@gvwilson
Copy link
Contributor

Q for @camdecoster : did the old usa map have the states? as you say, it's not much use without that…

@gvwilson
Copy link
Contributor

Trying to get npm run test-jasmine to work with the new topojson files, it looks like I'm tripping over a configuration issue.

  1. Original error message complained about not being able to find ../../../dist/topojsonworld_110m.json (note the missing '/' between 'topojson' and 'world').
  2. I modified getTopojsonPath in src/lib/topojson_utils.js as follows to insert the '/' if required:
 topojsonUtils.getTopojsonPath = function(topojsonURL, topojsonName) {
-    return topojsonURL + topojsonName + '.json';
+    if (topojsonURL.endsWith('/')) {
+       return topojsonURL + topojsonName + '.json';
+    } else {
+       return topojsonURL + '/' + topojsonName + '.json';
+    }
 };
  1. I also modified test/jasmine/karma.conf.js as follows because we're getting the JSON files from the local dist directory not the installed sane-topojson package:
-var pathToSaneTopojsonDist = path.join(__dirname, '..', '..', 'node_modules', 'sane-topojson', 'dist');
+var pathToTopojsonDist = path.join(__dirname, '..', '..', 'dist', 'topojson');

// ... and further down, edit to match the new variable name

-        {pattern: pathToSaneTopojsonDist + '/**', included: false, watched: false, served: true}
+        {pattern: pathToTopojsonDist + '/**', included: false, watched: false, served: true}

This still produces error messages like the one shown below saying that the JSON files can't be found:

	Failed: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
	Error: plotly.js could not find topojson file at ../../../dist/topojson/world_110m.json. Make sure the *topojsonURL* plot config option is set properly.
	    at Object.<anonymous> (/Users/gvwilson/plotly/plotly.js/src/plots/geo/geo.js:137:35 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:147433:31)
	    at Object.event (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:504:42 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:4875:48)
	    at XMLHttpRequest.respond (/Users/gvwilson/plotly/plotly.js/node_modules/@plotly/d3/d3.js:1951:24 <- /private/var/folders/w2/l51fjbjd25n9zbwkz9fw9jp00000gn/T/0c95a0b81672d3a2e3f59fc0534657ef-bundle.js:6326:30)
25 03 2025 10:40:03.170:WARN [web-server]: 404: /dist/topojson/world_110m.json
[note: the message above is then repeated several times]

I've tried variations on the path in test/jasmine/tests/geo_test.js specified by this line:

Plotly.setPlotConfig({ topojsonURL: '../../../dist/topojson' });

Adding an extra .. or removing one of the ones that's there doesn't affect the outcome.

@camdecoster
Copy link
Contributor Author

@gvwilson yes the old maps had 'subunits' layers for a few regions (USA and Brazil did for sure). The UN maps don't include that information. Thanks for looking into the tests. I'm working on an update to get the tests working but I haven't pushed that yet. I'll try to get that out later today.

@ndrezn
Copy link
Member

ndrezn commented Mar 25, 2025

@etpinard when you have a chance could you give this PR a look? Would love your eyes. Thank you!

In the short term what was the source for US states/Canadian provinces in the old dataset?

Refactor to simplify scripts

Switch to UN/NE geodata hybrid

Save UN geodata to archive

Remove extra info from topojson

Add centroids to geojson

Use 'simplify' to create 110m maps
@camdecoster camdecoster marked this pull request as ready for review April 8, 2025 18:28
@@ -1,8 +1,27 @@
'use strict';

var saneTopojson = require('sane-topojson');
const topojson = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any more concise syntax in js for achieving this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. This can be written more concisely to dynamically generate the file names or read the contents of a directory and use those to build the object, but that causes problems in the test suite. Karma runs in a browser context and therefore doesn't have the required Node libraries (like fs and path) without some changes to the config. Though it looks like a wall of text, it's simpler to just statically require everything. 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P1 needed for current cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use UN GeoJSON data for geo charts rather than Natural Earth
4 participants