Skip to content

walsh/data-science-quickstart

Repository files navigation

Data Science Quickstart

Essential command-line tools for exploring data for Code for Boston and Massachusetts Legal Hackers. This is an introductory course designed for those just starting to explore some of the open data sets provided through the City of Boston's Data Portal or Cambridge Open Data.

Example: Cambridge Parking Tickets: Top Violations

$ cat tmp/Cambridge_Parking_Tickets_*.csv | csvcut -c 5 | sort | uniq -c | sort -rn | head -n 8
310955 METER EXPIRED
47956 OVERTIME
43464 RESIDENT PERMIT ONLY
16380 NO STOPPING
12065 STREET CLEANING
10548 NO PARKING
9496 LOADING ZONE
2471 STORAGE

Massachusetts Legal Hackers

This workshop was created for Code for Boston and Massachusetts Legal Hackers.

Topics

  • cat -- concatenate and print files
  • grep -- file pattern searcher
  • awk -- pattern-directed scanning and processing language
  • cut -- cut out selected portions of each line of a file
  • sort - sort lines of text files
  • uniq -- report or filter out repeated lines in a file

Contributing

Pull Requests are welcome!

Dependencies

We assume you are using a Mac and have Homebrew installed.

brew install jq
brew install xmlstarlet
brew install sqlite --with-dbstat --with-fts --with-functions --with-json1

OSX comes with an older sqlite, installing via brew keeps the system sqlite intact while enabling new features.

The new shiny sqlite is available @

/usr/local/opt/sqlite/bin/sqlite3

For csvkit will be using a virutalenv using a brew installed Python3. While the machinations are somewhat cumbersome, the risk to your system and your sanity are low.

brew install xz gdbm readline
brew install python3
pip install virtualenv
virtualenv --python=/usr/local/bin/python3 science.env
# activate the new environment
source science.env/bin/activate
pip install --upgrade pip

From this point on, all commands will assume the science.env is activated.

pip install csvkit

Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages