Python for Journalists
Notebooks and files for the Python for Journalists course on Datajournalism.com
- What is Python anyway
- About the course
- Course Modules
- 1 Getting started
- 2 Clean data
- 3 Analyse data
- 4 Scrape data
- Learning More, Reference And Tools
- About Us
What Is Python Anyway
Python is a programming language for general-purpose programming. It's popular among data journalists for its readability, ease of use and efficiency.
About the Course
The course Python for Journalists is meant for journalists looking to learn the most common uses of Python for data journalism. During four modules the course teaches you how to set up Python and all Python-related tools on your own computer. Next you'll learn how to clean up messy datasets using the Pandas library. In the third module you'll learn how to analyse data, again using the Pandas library. In the fourth and final module you'll learn how to automatically download data from the web, by using both the Beautiful Soup and Requests libraries to dabbling in webscraping.
This Python for Journalists course is meant for those who dabbled in Python, but somehow didn't persevere; and for those who can't wait to dive in head first... Though no programming knowledge is required, it helps if you know what a terminal or command prompt is and if you are familiar with Excel.
For all modules except module 1 Set up, there is a Jupyter Notebook available to follow along during the course. Each notebook contains exercises and explanations. Happy Pythoning!
1 Set Up
This module revolves around installing the right tools on your laptop. To follow along in the coming modules, you'll need Python 3, and several Python libraries like Requests, Pandas and BeautifulSoup installed. Jupyter Notebooks come highly recommended. It's recommended that you install all of this software in one go, using the Anaconda distribution. This first module does not include a Jupyter Notebook.
On your computer:
- Install the Anaconda distribution to install Python 3, libraries Requests, Pandas, and BeautifulSoup, and Jupyter Notebooks all at once on your computer.
- Note: choose for the Anaconda installation that includes Python 3, at the time of writing that would be Python 3.6.
Extra preparation: If you want to make sure you have a solid foundation to build up on, you might want to learn about the Python syntax first. Here are some places where you can learn about different data types in Python, which might help before continuing with this course: (Since the following tutorials overlap, choosing one is highly recommended.)
2 Clean data
In this second module we'll show you how to get into your Python conda environment, and how to start a Jupyter Notebook. Once that's out of the way, you'll learn how to import a CSV-file into your Jupyter Notebook, to get ready for some data cleaning. Among other things you'll learn how to search and replace values inside a column; how to change the datatype of a column; and how to extract data from a column to populate a new column. This module includes two Jupyter Notebooks: one empty and another one completed - all named 'clean data'.
3 Analyse data
In this third module, you'll learn how to analyse data using the Pandas library. You'll learn how to explore your dataset, looking at summary statistics - count, median, mean, percentiles, standard deviation etc. - for each column. Next we'll look into how to sort, filter, sum and count values in columns. Finally you'll learn how to group data, creating (for those familiar with Excel) pivot tables, using the Pandas library. This module includes two Jupyter Notebooks: one empty and another one completed - all named 'analyse data'.
Extra exercises: If you want to make sure you have fully grasped this module, you can take on the extra notebook that contains some exercises. Since this is a later add on to the course, there is no video to accompany this notebook. However, you should be able to pull through without video. :) Off course there are two extra notebooks: one completed, and one for you to work in.
4 Scrape data
The final module revolves around scraping data using both the Requests and the BeautifulSoup libraries. Though in practice you'll likely first want to scrape data, to later clean and analyse those numbers, this module is last for training purposes. The modules on cleaning and analysing data introduced you to Python, Pandas and Jupyter Notebooks. Paving the way for some basic webscraping, including a for loop to collect data as efficient as possible. Finishing this module you should be able to write some basic webscrapers to collect data from the internet. This module includes two Jupyter Notebooks: one empty and another one completed - all named 'scrape data'.
- Allen B. Downey's digital book Think Python: How to Think Like a Computer Scientist
- Swaroop's free online book A Byte of Python
- Dan Bader's Python video tutorials on YouTube
- Al Sweigart's Automate the boring stuff with Python site
- Coding for Journalists
- Your First Python Notebook: a step-by-step guide to analyzing data with Python and the Jupyter Notebook.
- Data Camp Python Courses
- Zed Shaw's Learn Python the Hard Way
- EDx's Course Introduction to Computer Science and Programming Using Python
- Coursera Python for Everybody Specialization
- Coursera Applied Data Science with Python Specialization
The European Journalism Center believes that the use of data in journalism is a cornerstone of building resilience in any newsroom. After 10 years of experience running data journalism programmes they've created DataJournalism.com. The site provides data journalists with free resources, materials, online video courses and community forums. Once you sign in, you can enroll for free into one of our premium online courses or discuss with the community in our forums. Whether you are new to data journalism or deeply familiar with it, membership will expose you to like-minded data journalists and give you a free space to learn or improve your data skills.
About Winny de Jong
Winny works as a data journalist for the Dutch national news broadcast NOS. There she interviews datasets instead of people trying to find news before it is news. Winny usually speaks about the importance of data literacy, how to develop ideas, and her data journalistic workflow. She has presented before for organizations like TEDx, Brussels News Summit, DataHarvest+ and multiple journalism colleges. Every Sunday she shares the best of the data journalism web in her data journalism newsletter. Visit her online at winnymedia.nl or at her data blog.