Skip to content

A set of functions and classes to help web scraping and simple web audits

Notifications You must be signed in to change notification settings

ttavni/PyWebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping for Python

A set of functions and classes to help web scraping and simple web audits

from pyscraper.sitemapper import Sitemapper
from pyscraper.scrapper import BatchScrape

sitemap = 'https://www.datascience.com/sitemap.xml'

page_urls = Sitemapper(sitemap)
completed_urls, broken_urls = BatchScrape(page_urls)

In addition you can now visualise the hierachical nature of the sitemap and produce a d3.js visualisation

# Visualise pages
from pyscraper.viz import VisualiseSitemap
VisualiseSitemap(page_urls)

Visualisation

The text from each page could then be visualised using this repository

About

A set of functions and classes to help web scraping and simple web audits

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published