No description or website provided.
CSS JavaScript
Latest commit dfb1fee Dec 5, 2013 @bjarkih bjarkih Fixing data issues.
Failed to load latest commit information.
data Fixing data issues. Dec 5, 2013
factbook-crawler Fixing data issues. Dec 5, 2013 Started a simple intro May 9, 2012

World Factbook Corpus

The CIA World Factbook is a Public Domain data set comprising of geographical, economic and political data on every country in the world.

Data types include free text, currency, percentages, longitude & latitude, altitude, taxonomies, and as such it makes a viable test & demonstration corpus for search applications, on top of the intrinsic value of the data.

Since the Factbook is not available in an easily machine-readable format, we've created a crawler to extract the data in a way that should be easier to consume.


The crawler was written using Node.js and outputs in both XML and JSON. Pre-generated output is provided.

Run the crawler

The command below will extract data from the dataset in ./factbook-crawler/data and export it to ./data

    node factbook-crawler/index.js

Use the data

var fs = require('fs'),
    path = require('path');

    var country = JSON.parse(fs.readFileSync('./data/json/'+file));
    console.log( )