No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Meta Data Science

Thomas Levine

Meta data science

Datasets on Socrata portals feel like files rather than data.

  • Data science about data science
  • Science about metadata


  1. Data science mindset
  2. What I did
  3. What I learned
  4. Things to consider

Data science mindset

Exploit cheap computers to study how the world works.

  1. Store everything.
  2. Anything can be counted.
  3. Numbers can be turned into anything.
  4. Boring work should be sent to robots.
  5. Get more data rather than tuning your model.

Store everything

  • Storage is cheap.
  • You don't need a full research plan.

Anything can be counted

Numbers can be turned into anything

Boring work should be sent to robots

  • Computers can perform mindless tasks
  • Computers can also make complex decisions
  • All analyses should be scripted.

Get more data rather than tuning your model

  • Modeling problems versus computation/storage problems
  • Confidence versus validity

Banko & Brill

  • Don't collect new data to answer your new questions.
  • Look for new ways of using existing data sources.
  • Store raw data! Don't aggregate prematurely.

What I did

Data science about open data

Store everything

Architecture of the Socrata downloader

Anything can be counted

Public meetings by day of week

When datasets got uploaded

Numbers can be turned into anything


Boring work should be sent to robots

Site analytics

Scripted analyses

Get more data rather than tuning your model

What I learned

  1. Nobody knows much
  2. How Socrata Open Data portal is constructed
  3. How people use Socrata Open Data portal

What people know

  • Portal administrators
  • Portal developers
  • Anecdotes

Construction of Socrata Open Data Portal

Data provenance

Every view on Socrata has an "owner" and a "table author". What's an owner, and what's a table author?

API limits

What are Socrata's API limits?

I don't know, but they apply across all portals.

Form validation

What must be true about the form fields?

"Suggest a Dataset" form

One web application

With a some software, you have many different installations that might be able to communicate with each other.

  • Wordpress
  • CKAN

With other software, a single web application runs everything.

  • Tumblr
  • Socrata

How people use Socrata

Analysis tools exist.

People use them.

But not really.

VinylFox tweet

Benefits of the data portal

(As I see it)

  1. Import data from various formats.
  2. Standard way of discovering datasets.
  3. Convert data to standard formats.
  4. Mark datasets as official in some sense.

But not a lot of analysis

Things to consider

Data science

  • Store/expose everything
  • Datasets are data points, and metadata is data
  • You can automate human work, even if it seems complicated.


  • What if the different portals were more connected?
  • Are the analysis tools important?