Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Add Installation Instructions #2

Merged
merged 3 commits into from

2 participants

@tristen

This adds installation instructions to the README file and links to your blog post :)

@tmcw tmcw merged commit f5aa5ff into tmcw:master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 30 additions and 5 deletions.
  1. +30 −5 README.md
View
35 README.md
@@ -1,11 +1,8 @@
# The DC Municipal Regulations
The [DC Municipal Regulations](http://www.dcregs.dc.gov/).
-
-* `scrape.js`: scraping. requires request, get, cheerio
-* `textify.py`: requires sh, catdoc
-
-This is also an experiment in open data design.
+This is also an experiment in open data design. See relevant post on
+[Indexing and Searching Big Static Data](http://macwright.org/2012/11/14/indexing-searching-big-static-data.html)
Most open data sites run as traditional websites; they have a content
management function (a UI for editing content), a 'search engine',
@@ -31,3 +28,31 @@ could then pre-pay for years of service.
This should also permit incredibly easy clonability; federating the data
store will be as simple as downloading a copy.
+
+
+## Installation
+
+ mkdir docs text indexes
+
+Install node.js module dependencies:
+
+ npm install
+
+Scrape the datasource
+
+ node/scrape/scrape.js
+
+Generate a json document from the titles generated in the docs directory
+
+ node titles.json
+
+Convert the documents into plain text
+Requires python plugins `catdoc` and `sh` installed which you can install via pip
+
+ python textify.py
+
+Lastly, generate an index by running
+
+ cd generate_index
+ node invert.js
+ node trie.js
Something went wrong with that request. Please try again.