Brands in Who's On First documents.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
sizes
utils
.gitignore
CONTRIBUTING.md
LICENSE.md
Makefile
README.md
mk-utils.sh

README.md

whosonfirst-brands

Brands in Who's On First documents.

Caveats

This is a work in progress and very much still "wet paint" and there is little to no tooling for this stuff yet.

Where do all these #brands come from?

At the moment, they come from the Elasticsearch index running the Who's On First Spelunker. They are the product of a not very sophisticated faceting process on an unanalyzed copy of the wof:name field (called unsuprisingly name_not_analyzed). Like this:

curl -s -v --max-time 600 'http://localhost:9200/spelunker/_search?from=0&size=50' -d '{"query": {"term": {"w:placetype": "venue"}}, "aggregations": {"brands": {"terms": {"field": "name_not_analyzed", "size": 0}}}, "size": 0}' > brands.json

That produces something like 16 million distinct names. We have not imported most of those. Instead we have limited the #brands included here to only those with 50 (or more) venues. So instead of 16 million #brands we have about 7,400 as of this writing. Maybe the cut-off point should be 25, maybe it should be 10. Maybe it should be 5. We don't know yet. We're figuring it out as we go.

It is assumed that a whole bunch of these records will be superseded or deprecated or both. That work remains tomorrow's problem.