Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Supervised Text-based Geolocation Using Language Models on an Adaptive Grid
This page explains the process of replicating the results of:
Stephen Roller, Michael Speriosu, Sarat Rallapalli, Benjamin Wing and Jason Baldridge. Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. EMNLP 2012. Jeju, Korea.
Getting the code
The first step is to get the code. Check out or download the code from
Setting things up
You'll need to set up your environment as per the directions (step 2-3 in README.txt). Specifically, you must set the
$TEXTGROUNDER_DIR variable to the root of the textgrounder source code, and add
$TEXTGROUNDER_DIR/bin to your
Getting the data.
Next you'll need the data. For Geotext and Wikipedia, follow step 4 in README.txt
For the UtGeo data set, follow the README.txt in
As suggested by this document, it is highly encouraged you contact the first author (firstname.lastname@example.org) when you begin this process, as obtaining the full data set may be difficult.
textgrounder build-all from the
To run the program, you'll need
$ textgrounder -memory 30g geolocate-document --corpus $PATH_TO_CORPUS/$CORPUS_NAME (--kd| --kdbs $BUCKET_SIZE --kdsm (median|halfway) --cm (center|centroid) --eval-set (dev|test)
where median/halfway correspond to the Friedman/Midpoint methods of splitting.
For example, to run on UtGeo large and evaluate on the dev set, using only a KD tree bucket size of 500; Friedman splitting; and centroid cell prediction, I personally use:
$ textgrounder -memory 30g geolocate-document --corpus $SCRATCH/corpora/utgeo-large --kd --kdbs 500 --kdsm median --cm centroid --eval-set dev
Your settings will vary depending exactly on your setup and which method you wish to test.
Please contact Stephen Roller email@example.com for any questions pertaining to replicating results. This program can take some effort to get up and running, and so please feel free to ask for help.