Skip to content

Commit

Permalink
new socrata post
Browse files Browse the repository at this point in the history
  • Loading branch information
tlevine committed Jul 19, 2013
1 parent a21c6a5 commit 17d2398
Show file tree
Hide file tree
Showing 9 changed files with 470 additions and 78 deletions.
439 changes: 399 additions & 40 deletions !/feed.xml

Large diffs are not rendered by default.

12 changes: 10 additions & 2 deletions !/index.html
Expand Up @@ -76,13 +76,21 @@
</nav> </nav>
<header class="title-card"> <header class="title-card">
<h1> <h1>
<a href="r-spells-for-data-wizards/">R spells for data wizards</a> <a href="socrata-genealogies/">Progeny of Ten Socrata Datasets</a>
</h1> </h1>
<div class="date"> <div class="date">
July 10, 2013 July 19, 2013
</div> </div>
</header> </header>
<div class="clearfix" id="links"> <div class="clearfix" id="links">
<div class="link">
<strong>
<a href="r-spells-for-data-wizards/">R spells for data wizards</a>
</strong>
<footer>
Jul 10, 2013
</footer>
</div>
<div class="link"> <div class="link">
<strong> <strong>
<a href="socrata-summary/">Analyze all the datasets</a> <a href="socrata-summary/">Analyze all the datasets</a>
Expand Down
Binary file added !/socrata-genealogies/downloads.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added !/socrata-genealogies/family.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added !/socrata-genealogies/hits.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
97 changes: 61 additions & 36 deletions !/socrata-genealogies/index.html
Expand Up @@ -7,18 +7,18 @@
<!--<![endif]--> <!--<![endif]-->
<head> <head>
<meta charset='utf-8'> <meta charset='utf-8'>
<title>Progenies of Ten Socrata Datasets</title> <title>Progeny of Ten Socrata Datasets</title>
<meta content='How are datasets are transformed in Socrata, and what can we can learn from that?' name='description'> <meta content='How are datasets are transformed in Socrata, and what can we can learn from that?' name='description'>
<meta content='Thomas Levine' name='author'> <meta content='Thomas Levine' name='author'>
<link href='http://domain/humans.txt' rel='author' type='text/plain'> <link href='http://domain/humans.txt' rel='author' type='text/plain'>
<meta content='nanoc 3.6.4' name='generator'> <meta content='nanoc 3.6.4' name='generator'>
<meta content='width=device-width' name='viewport'> <meta content='width=device-width' name='viewport'>
<meta content='summary' name='twitter:card'> <meta content='summary' name='twitter:card'>
<meta content='@thomaslevine' name='twitter:site'> <meta content='@thomaslevine' name='twitter:site'>
<meta content='Progenies of Ten Socrata Datasets' name='twitter:title'> <meta content='Progeny of Ten Socrata Datasets' name='twitter:title'>
<meta content='How are datasets are transformed in Socrata, and what can we can learn from that?' name='twitter:description'> <meta content='How are datasets are transformed in Socrata, and what can we can learn from that?' name='twitter:description'>
<meta content='@thomaslevine' name='twitter:creator'> <meta content='@thomaslevine' name='twitter:creator'>
<meta content='http://thomaslevine.com/!/socrata-genealogies/screenshot.png' name='twitter:image:src'> <meta content='http://thomaslevine.com/!/socrata-genealogies/family.png' name='twitter:image:src'>
<meta content='thomaslevine.com' name='twitter:domain'> <meta content='thomaslevine.com' name='twitter:domain'>
<meta content='' name='twitter:app:name:iphone'> <meta content='' name='twitter:app:name:iphone'>
<meta content='' name='twitter:app:name:ipad'> <meta content='' name='twitter:app:name:ipad'>
Expand All @@ -31,9 +31,9 @@
<meta content='' name='twitter:app:id:googleplay'> <meta content='' name='twitter:app:id:googleplay'>
<meta content='http://thomaslevine.com/!/socrata-genealogies/' property='og:url'> <meta content='http://thomaslevine.com/!/socrata-genealogies/' property='og:url'>
<meta content='thomaslevine.com' property='og:site_name'> <meta content='thomaslevine.com' property='og:site_name'>
<meta content="It's cool what you can do when data analysis is logged and exposed publically over the web." property='og:description'> <meta content="It's cool what you can do when everyone's data analysis is logged and exposed publicly over the web." property='og:description'>
<meta content='How are datasets are transformed in Socrata, and what can we can learn from that?' property='og:title'> <meta content='How are datasets are transformed in Socrata, and what can we can learn from that?' property='og:title'>
<meta content='http://thomaslevine.com/!/socrata-genealogies/screenshot.png' property='og:image'> <meta content='http://thomaslevine.com/!/socrata-genealogies/family.png' property='og:image'>
<link href='/favicon.ico' rel='icon' type='image/x-icon'> <link href='/favicon.ico' rel='icon' type='image/x-icon'>
<link href='/!/feed.xml' rel='alternate' title='Thomas Levine' type='application/atom+xml'> <link href='/!/feed.xml' rel='alternate' title='Thomas Levine' type='application/atom+xml'>
<link href='http://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'> <link href='http://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'>
Expand Down Expand Up @@ -73,10 +73,10 @@
</nav> </nav>
<header class='title-card'> <header class='title-card'>
<h1> <h1>
Progenies of Ten Socrata Datasets Progeny of Ten Socrata Datasets
</h1> </h1>
<div class='date'> <div class='date'>

July 19, 2013
</div> </div>
</header> </header>
<div id='article-wrapper'> <div id='article-wrapper'>
Expand All @@ -89,7 +89,7 @@ <h1>


<p>I recently downloaded all of the metadata about all of the datasets from all <p>I recently downloaded all of the metadata about all of the datasets from all
of the Socrata portals and then posted this <a href="/!/socrata-summary">summary</a> of of the Socrata portals and then posted this <a href="/!/socrata-summary">summary</a> of
the data. Now on to some deeper further analysis.</p> the data. Now on to some deeper analysis.</p>


<h2 id="what-is-a-dataset">What is a dataset?</h2> <h2 id="what-is-a-dataset">What is a dataset?</h2>
<p>As the Twitters have pointed out,the dataset counts that I presented in my <p>As the Twitters have pointed out,the dataset counts that I presented in my
Expand All @@ -105,7 +105,7 @@ <h2 id="what-is-a-dataset">What is a dataset?</h2>
<ol> <ol>
<li>Socrata concepts and terminology</li> <li>Socrata concepts and terminology</li>
<li>Ways that we can arrive at apparent duplicates in Socrata data</li> <li>Ways that we can arrive at apparent duplicates in Socrata data</li>
<li>The progenies of ten Socrata datasets</li> <li>The progeny of ten Socrata datasets</li>
</ol> </ol>


<h2 id="socrata-terminology">Socrata terminology</h2> <h2 id="socrata-terminology">Socrata terminology</h2>
Expand Down Expand Up @@ -148,7 +148,7 @@ <h3 id="filtered-views">Filtered views</h3>
<a href="https://data.oaklandnet.com/Environmental/Public-Works-Volunteer-Opportunities/sduu-bfki">Public Works Volunteer Opportunities</a> <a href="https://data.oaklandnet.com/Environmental/Public-Works-Volunteer-Opportunities/sduu-bfki">Public Works Volunteer Opportunities</a>
to include only opportunities on July 29.</p> to include only opportunities on July 29.</p>


<p><a href="filter.png"><img src="filter.png" alt="Filtering on date July 29" class="wide" /></a></p> <p><img src="filter.png" alt="Filtering on date July 29" class="wide" /></p>


<p><a href="https://data.oaklandnet.com/Environmental/Volunteer-Opportunities-on-July-29/vyhb-nqtw">Here</a>’s the resulting filtered view.</p> <p><a href="https://data.oaklandnet.com/Environmental/Volunteer-Opportunities-on-July-29/vyhb-nqtw">Here</a>’s the resulting filtered view.</p>


Expand All @@ -165,6 +165,9 @@ <h3 id="charts-and-maps">Charts and maps</h3>
for now.</p> for now.</p>


<h3 id="tables">Tables</h3> <h3 id="tables">Tables</h3>
<p><img src="/!/socrata-genealogies/family.jpg" alt="A table family, containing a dataset and several filtered views, charts and maps" class="wide" />
<!-- Icons from https://explore.data.gov/stylesheets/images/icons/type_icons_30.png?1 --></p>

<p>There is also a concept of a <strong>table</strong>, and <p>There is also a concept of a <strong>table</strong>, and
it is somewhat abstract. Here are two ways of thinking of it.</p> it is somewhat abstract. Here are two ways of thinking of it.</p>


Expand All @@ -189,7 +192,7 @@ <h3 id="federation">Federation</h3>
the datasets.) But it is possible for one data portal to include all of the datasets.) But it is possible for one data portal to include all of
another portal’s datasets.</p> another portal’s datasets.</p>


<p>Sometimes, you’ll see a view in the search &amp; browse pane with a grey background, <p>Sometimes, you’ll see a view in the search &amp; browse pane with a gray background,
instead of white. Hawaii has a bunch of these.</p> instead of white. Hawaii has a bunch of these.</p>


<p><a href="https://data.hawaii.gov/"><img src="hawaii.png" alt="Hawaii data portal" class="wide" /></a></p> <p><a href="https://data.hawaii.gov/"><img src="hawaii.png" alt="Hawaii data portal" class="wide" /></a></p>
Expand All @@ -201,7 +204,9 @@ <h3 id="federation">Federation</h3>


<p>This request shows up in the administrator interface for the source portal. <p>This request shows up in the administrator interface for the source portal.
If the source portal accepts the request, all of the views from the source portal If the source portal accepts the request, all of the views from the source portal
are provided to the destination portal as in the screenshot above.</p> are provided to the destination portal as in the screenshot above. Here are
<a href="http://www.socrata.com/video/socrata-open-data-federation-demonstration/">two</a>
<a href="http://www.socrata.com/datagov/open-data-federation-video/">videos</a> about that.</p>


<p>If you look closely, you’ll notice that the federated views are actually just <p>If you look closely, you’ll notice that the federated views are actually just
links to the source portal; the views show up in the search, but they aren’t links to the source portal; the views show up in the search, but they aren’t
Expand All @@ -216,10 +221,7 @@ <h2 id="types-of-duplicate-datasets">Types of duplicate datasets</h2>


<h3 id="soda-queries-filtered-views-charts-maps">SODA queries: Filtered views, charts, maps</h3> <h3 id="soda-queries-filtered-views-charts-maps">SODA queries: Filtered views, charts, maps</h3>
<p>After a dataset is uploaded, people can create many views that derive from it. <p>After a dataset is uploaded, people can create many views that derive from it.
Depending on what you want to know, it might not make sense to treat these as In my previous analysis, I counted filtered views, charts and maps all as separate
separate entities.</p>

<p>In my previous analysis, I did count filtered views, charts and maps all as separate
entities. I think it’s worth separating these because they can be derived from the entities. I think it’s worth separating these because they can be derived from the
source datasets.</p> source datasets.</p>


Expand Down Expand Up @@ -278,11 +280,10 @@ <h3 id="copied-rather-than-elegantly-linked">Copied rather than elegantly linked
I haven’t done it on a larger scale, but that would be fun to do later.</p> I haven’t done it on a larger scale, but that would be fun to do later.</p>


<h2 id="ten-large-dataset-families">Ten large dataset families</h2> <h2 id="ten-large-dataset-families">Ten large dataset families</h2>
<p>It took me quite a while to figure out how all of this works. <p>It took me quite a while to figure out everything that I explained above.
(That’s a story in itself.) My goal all along was to start looking (That’s a story in itself.) My goal all along was to start looking
at how families of datasets are related. I figured I’d make something at how families of datasets are related, so now I’ll talk about what I
a bit less sloppy than ggplot plots tiny text and with legends did on that front.</p>
hanging off of the page.</p>


<h3 id="methodology">Methodology</h3> <h3 id="methodology">Methodology</h3>
<p>I grouped all of the views that I had collected by table. (Recall that <p>I grouped all of the views that I had collected by table. (Recall that
Expand All @@ -296,7 +297,7 @@ <h3 id="methodology">Methodology</h3>
I’ve included that figure in the present report.)</p> I’ve included that figure in the present report.)</p>


<p>Out of these datasets, I took the top ten datasets, and I <p>Out of these datasets, I took the top ten datasets, and I
show their families in the table at the end of this page. Select a dataset, show their families in the fancy table at the end of this page. Select a dataset,
and then you can see all of that dataset plus all of the filtered views, and then you can see all of that dataset plus all of the filtered views,
maps and charts of that dataset. You can also see which portals each of maps and charts of that dataset. You can also see which portals each of
these datasets is federated to. You can sort by the different columns, these datasets is federated to. You can sort by the different columns,
Expand All @@ -305,7 +306,10 @@ <h3 id="methodology">Methodology</h3>
<p>And In case you’re reading this a year later, the data were collected from <p>And In case you’re reading this a year later, the data were collected from
Socrata portals at the end of May 2013.</p> Socrata portals at the end of May 2013.</p>


<h3 id="why-its-not-a-tree">Why it’s not a tree</h3> <h3 id="discussion">Discussion</h3>
<p><em>This section might make more sense if you play with the fancy table first.</em></p>

<h4 id="why-its-not-a-tree">Why it’s not a tree</h4>
<p>In Socrata, you can create a filtered view, chart or map based on a dataset, <p>In Socrata, you can create a filtered view, chart or map based on a dataset,
and the link to the source dataset will be preserved. This is represented and the link to the source dataset will be preserved. This is represented
in the table below.</p> in the table below.</p>
Expand All @@ -316,45 +320,66 @@ <h3 id="why-its-not-a-tree">Why it’s not a tree</h3>
represented as a child of the original dataset rather than a child of the old represented as a child of the original dataset rather than a child of the old
filtered view.</p> filtered view.</p>


<h3 id="things-to-look-for">Things to look for</h3> <p>Thus, we don’t get the full family tree that you might have expected.</p>

<h4 id="the-source-dataset">The source dataset</h4>
<p>If you sort by “Created” date, the first one should be the source dataset.</p>


<h4 id="compare-family-statistics-with-view-statistics">Compare family statistics with view statistics</h4> <h4 id="compare-family-statistics-with-view-statistics">Compare family statistics with view statistics</h4>
<p>In some cases, like with the White House visitor records requests, most of the <p>In some cases, like with the White House visitor records requests, most of the
downloads and hits for the whole family are from this source dataset. downloads and hits for the whole family are from this source dataset.
In other cases, like the World Bank major contract awards, only a small In other cases, like the World Bank major contract awards, only a small
minority comes from this source dataset. This might tell us something about minority comes from this source dataset. This occurrence is illustrated by the
plots below.</p>

<p>The first plot looks at hits, and the second at downloads. Within each plot,
the left (red) dot is the number of hits/downloads that the source dataset
received and the right (blue) dot is the total hits/downloads across the whole
family.</p>

<p>If these are close to each other (that is, the black line is short),
most of the hits/downloads came from the source dataset.
If they are far apart, most
hits/downloads came from filtered views, charts and maps.</p>

<p><img src="/!/socrata-genealogies/hits.png" alt="Hits by dataset family" class="wide" /></p>

<p><img src="/!/socrata-genealogies/downloads.png" alt="Downloads by dataset family" class="wide" /></p>

<p>This information might tell us something about
how people like to use the data. Perhaps people working with the World Bank how people like to use the data. Perhaps people working with the World Bank
contracts are interested in subsets for their particular region and time. contracts are interested in subsets for their particular region and time.
And maybe people are just playing with the White House data because it’s the And maybe people are just playing with the White House data because it’s the
first one in the list.</p> first one in the list.</p>


<h4 id="view-size">View size</h4> <h4 id="view-size-and-shape">View size and shape</h4>
<p>The view size gives us an idea of what sort of queries people are running. <p>The view size and shape give us an idea of what sort of queries people are running.
Are people selecting certain variables, or are they aggregating or subsetting Are people selecting certain variables, or are they aggregating or subsetting
the records?</p> the records?</p>


<p><img src="/!/socrata-genealogies/query-1.jpg" alt="A rectangle indicating the original dataset" /></p>

<p><img src="/!/socrata-genealogies/query-2.jpg" alt="The same rectangle, with a shorter one for a record subset" /></p>

<p><img src="/!/socrata-genealogies/query-3.jpg" alt="The same rectangles, with a tall, thin one for a selection of variables" /></p>

<h4 id="federation-2">Federation</h4> <h4 id="federation-2">Federation</h4>
<p>As I discussed earlier, federation is all-or-nothing; you either include all <p>As I discussed earlier, federation is all-or-nothing; you either include all
of the source portal’s datasets or none of them. So you would expect that the of the source portal’s datasets or none of them. So you would expect that the
“Federation” column would list the same number of copies for each dataset. “Federation” column would list the same number of copies for each dataset.
In at least one instance (FEC contributions), this is not the case. what’s In at least one instance (FEC contributions), this is not the case.
going on there?</p> I haven’t figured out what’s going on there.</p>


<h3 id="relevance">Relevance</h3> <h3 id="relevance">Relevance</h3>
<p>Frankly, this table is a rather terrible way of exploring these broader trends,
but it conveys the scale with which datasets are being adapted on Socrata and
lets us drill down to the views on Socrata to see more detail.</p>

<p>Socrata exposes enough of the data analysis process that we can start to see <p>Socrata exposes enough of the data analysis process that we can start to see
what sorts of analyses different people are doing. We can see what sorts of what sorts of analyses different people are doing. We can see what sorts of
datasets are interesting to people. We may even be able to develop new datasets are interesting to people. We may even be able to develop new
guidelines for publishing datasets through analysis of what makes datasets more guidelines for publishing datasets through analysis of what makes datasets more
likely to be viewed, downloaded and filtered on Socrata.</p> likely to be viewed, downloaded and filtered on Socrata.</p>


<p>And now, the dataset progeny explorer:</p> <h3 id="data-family-explorer">Data family explorer</h3>
<p>And now, the aforementioned fancy table. As I said above, this table contains
the families/tables associated with the ten datasets with the largest families.
Select a dataset, and then you can see all of that dataset plus all of the
filtered views, charts and maps, with some information about each. And if you
sort by “Created” date, the first one should be the source dataset.</p>


<!-- Scripts after the introduction so you don't notice the table loading --> <!-- Scripts after the introduction so you don't notice the table loading -->
<script src="angular.min.js"></script> <script src="angular.min.js"></script>
Expand Down
Binary file added !/socrata-genealogies/query-1.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added !/socrata-genealogies/query-2.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added !/socrata-genealogies/query-3.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 17d2398

Please sign in to comment.