“Risks and Rewards of Rolling Your Own Criminal Justice Data”
Discussion at IRE’s 2016 CAR conference in Denver, Colo.
Our panelists’ projects
- The Guardian’s “The Counted”, represented by Kenan Davis
- The Marshall Project’s “The Next To Die”, represented by Gabriel Dance and Tom Meagher
- USA Today’s “Behind the Bloodshed”, represented by Jodi Upton
- Washington Post’s “Police Shootings”, represented by Steven Rich
Other recent projects worth checking out
- “The Lost and the Found” from Center for Investigative Reporting
- “Over the Line” by the Atlanta Journal-Constitution
- “Deaths after police use of force in Minnesota” by the Minneapolis Star Tribune
Advice on how to build your own criminal justice database
Do your research (and don’t reinvent the wheel). Don’t waste your time doing what someone else is already doing well.
Define your universe carefully. It’s worth a lot of early discussion on exactly what it is you plan to accomplish and how, but it will make the work later go more easily if you can tell someone exactly what fits and what doesn’t and why. It may sound obvious, but when you’re creating your own data set, it’s really easy to include incidents that made headlines, but don’t really fit. Makes for a good read, but it introduces bias. And sloppiness. Stay true.
Beware of small sample size theater. Sometimes we measure rare events, which means one badly coded incident or missing data can push a trend. This is especially true when multiple news outlets are measuring similar things and reporting contradictory ‘trends.’
Have serious discussions about what data is missing and how it hurts the story. We don’t like to think about our data shortcomings, but myopia will hurt you later. Remember Abraham Wald and the missing bullet holes.
Have a definitely-can-get-that story. Editors hate dry wells. Make sure you have a good story either way.
Let the data tell you what the story is. You may have preconceived notions of what your data will say when you collect it. Write based on what your data says and when a notion is torn down, that might be a good story, too.
Define a duration for the project. At The Marshall Project, we decided to track executions from September 2015 through December of 2016. This was important because of our thin resources, but also because we partnered with many organizations and they needed to know how long the project would last. As many of us surely have, you don’t wanna fire up an intensive data-gathering project without talking about the time and resource commitments it would require.
Create a community around your database. From the outset, you should design and build with this in mind. At the Guardian, we worked closely with our audience team to add entry points for our readers to contribute to the project. We also made internal tools for the audience team to create elements from the database to be shared with our dedicated community of readers on Facebook and Twitter.
Be Transparent. Make sure the public knows your methodology and collection methods. You may know what goes in and what does not, but everyone else should know, too.
Figure out what items of information you want to collect when you start. It’s one thing to define a universe; it’s quite another to decide what pieces of information to collect on the items in your universe. The more you pick, the more work you create for yourself, but sometimes that’s a better option than going back and updating every single item.
Think deeply about presentation. The visual form that you choose for your database will define how people interpret the data. For “The Counted,” the interactive team considered several designs before choosing a look that highlights the lives lost and the circumstances of each incident without serving simply as a memorial. I’m not sure that we completely succeeded but it helped frame our design decisions so that we could call out what we felt was most important.
Provide both key insights and the ability to explore. Your readers should very quickly see the main takeaway(s) from your database. But you should also provide multiple ways to view and filter the data.