Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 490 Bytes

File metadata and controls

14 lines (10 loc) · 490 Bytes

WebScraping

Web scraping can be quite useful to gather data that is not avaialble through an API. Here, some sample code is provided for Beautiful Soup, a web scraping library that is easy to use.

What is it?

  1. link_web.py: script that uses Beautiful Soup and NetworkX to create a graph representing the links between web pages, starting from a given page.
  2. preprocessing: Python script that scrapes a web page containing FAQs and printing them in JSONL format.