Working with Web Data in Python 🐍

Course materials for working with web data in Python. Originally developed for a 1 full-day course at the Methods Institute @Sheffield

About the course

This course will show how one can treat the Internet as a source of data.

What will be covered?

Use Python to scrape, parse and read web data
Understanding Application Programming Interfaces (APIs) and use them to collect data
How to query APIs using the appropriate requests (case studies: GitHub, Twitter)
Commonly returned data formats: HTML, JSON, XML
Programatic web data collection and streams
Regular expressions 📃
Manipulating web data with Pandas (visualization included)

The materials are split into broad sections containing exercises, explanations and most of the information you need to complete them. The material covered is far from exhaustive. Instead, we try to provide enough information and tasks to get you started and we hope to get you quickly to a level where you are then capable of continuing to use Python to collect and handle web data.

Pre requisites

For you to be able to follow along the course you need to have a basic knowledge of Python as well as a basic understanding of:

Functions
Loops
Nested data structures
Variables assignation and types
How to import modules in Python
Basic HTML tagging

💻 Software requirements

We recommend using the Anaconda distribution of Python. It's free and comes with a large number of additional modules included ready for importing into your scripts, IPython shell and notebook interfaces, a powerful Python text editor (Spyder), and a good package manager, conda, for updating and installing packages.

You need to have the following installed in your laptop for the course:

Anaconda (get it from here)
Python > 3.5
pip
Jupyter notebooks (already installed with Anaconda)
beautifulsoup4
requests
scrapy
lxml
shell (we recommend using git bash: installed with Git or cmdr )

✨ The easiest/fastest way to get this is to download and install Anaconda. Make sure to add it to your Path during installation. Once you have Anaconda installed open your shell (terminal/command line) and clone this repository:

$ git clone https://github.com/trallard/WebData_Python.git

Then navigate to the directory containing the materials for the session. So if you have them in Documents/WebData_Python you'll type:

$ cd Documents/WebData_Python

Next, we'll use conda environments to install all the needed packages we need for the course:

$ conda env create -f environment.yml

Finally, you need to activate the environment you just created:

$ source activate webdata

To deactivate the environment you can do it like so:

$ source deactivate

⚡️ Course content

Acknowledgements

The development of this material was funded by OpenDreamKit, a Horizon2020 European Research Infrastructure project (676541) that aims to advance the open source computational mathematics ecosystem.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assets		assets
example_files		example_files
solutions		solutions
styles		styles
.gitignore		.gitignore
01_Regex.ipynb		01_Regex.ipynb
02_Dictionaries.ipynb		02_Dictionaries.ipynb
03_Web-scraping.ipynb		03_Web-scraping.ipynb
04_JSON.ipynb		04_JSON.ipynb
05_APIs.ipynb		05_APIs.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
person_file.json		person_file.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Working with Web Data in Python 🐍

About the course

Pre requisites

💻 Software requirements

⚡️ Course content

Acknowledgements

About

Uh oh!

Releases 1

Packages

Languages

License

trallard/WebData_Python

Folders and files

Latest commit

History

Repository files navigation

Working with Web Data in Python 🐍

About the course

Pre requisites

💻 Software requirements

⚡️ Course content

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages