Have suggestions or feedback? Please let me know! rob [at] uohack [dot] com or file a pull request!
We'll be hosting this workshop Tuesdays from 6-8 p.m. in Knight Library room 144. Schedule below:
|Oct. 17||Getting started, the basics of data|
|Oct. 24||Cleaning data, spreadsheets and formulas|
|Oct. 31||Halloween, we'll take this week off|
|Nov. 7||Sharing data, the basics of data visualization|
|Nov. 14||Advanced analysis with Python Pandas|
Data is everywhere.
We interact with data everyday in thousands of ways and data literacy is seldom taught in schools. No matter the industry you wish to enter, no matter what it is you want to do with your life, data will be a part of it.
Data is certainly nothing new.
But recently, processing data has become extremely cheap and efficient. More than 30 years ago, accountants had large spreadsheets laid out over big desks and counted numbers by hand. If one cell changed, it might take a day or more to update the rest of the sheet.
Now computers can render changes to a spreadsheet instantaneously and process thousands of data points in near-real time to give you complex analysis as the data changes.
Everyone should learn basic data literacy skills.
To emphasize the point that data has become an extremely important concept to learn about, not just in America but around the world, let's look at some data.
I did a few quick Google searches to see how influential three key phrases are on the internet, as a rough snapshot of their overall popularity. The following information was gathered in June 2017.
First, I tried searching for the phrase
university of oregon, which returned 124 million results. Not bad, but let's expand that a little to a world-wide topic like
football, which returns 1.4 billion results (11 times higher than
university of oregon).
Now, let's try the word
data. This returns 5.6 billion results, four times more than
football and 45 times more than
university of oregon.
These numbers are big and can seem fuzzy when compared anecdotally. Let's look at them in two other ways. First, as a table:
|university of oregon||124,000,000|
Since this is a small, simple data set, seeing the figures in a table can help put them in perspective. In this case, right-aligning the numbers helps add context. Another way to visualize the data is, of course, a chart.
This simple chart, made in Google Sheets, gives the reader an additional way to compare the numbers.
While all three ways of viewing data (anecdotally, table and chart) are all technically correct, they each provide a different experience. If you are going to be analyzing data, you also need to keep in mind how you will communicate your findings.
Data literacy is a skill that can be learned and should be practiced.
This is a four-course introduction on data literacy. We will start from zero, with no prior knowledge required, and work our way up to advanced data analysis using Python.
My goal is to introduce you to these topics and give you the tools to begin working with data. You will see several examples, all of which use real-world data, and learn different techniques to work with the various types of information.
Like learning anything else, you will need to practice in order to get better. Unfortunately, this takes time and effort. Fortunately, the tools are largely open-source and free. If you pay attention to these four courses, you will be equipped with a solid foundation to tackle any data set.
Let's get going.
- 01 - Getting started
- Basic steps of working with data
- CSV (Comma-Separated Values)
- Get data into a spreadsheet
- Example: Lane County pot shop delivery
- 02 - Cleaning data
- Basic spreadsheet formulas
- Percent change
- Example: Extract data from PDF (city budget) and clean
- 03 - Sharing data
- Being transparent with data
- Types of data visualizations
- Example: Query federal data and create map
- 04 - Advanced analysis
- Use Python pandas to analyze data
- Example: Examine campus parking citations using Pandas
Of course, my four courses here are a very short introduction to a massive world of information. Here are some additional resources depending on what you're interested in.
- Data journalism handbook (220 pages print, free online)
- Map types and data types - UC Santa Barbara (PowerPoint lecture notes)
- There Are Many Ways to Map Election
Results. We’ve Tried Most of Them. - New York Times (article)
- Election maps - Mark Newman, Department of Physics and Center for the Study of Complex Systems, University of Michigan (short paper)
- First Python Notebook (getting started with Jupyter and Pandas) - Ben Welsh, LA Times Data Desk editor
- Here are 9 email newsletters about data… I think you’ll like at least 4 of them - Online journalism blog
- Data viz project
What great books and blog posts am I missing? Let me know! rob [at] uohack [dot] com