Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download
Latest commit e98489b Aug 6, 2015
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Initial commit Jul 21, 2015
README.md
broken_links.py 1. use urllib2 Jul 22, 2015
setup.py setup.py Jul 21, 2015

README.md

Crawling Broken Links with 404 Error in Python

The sample demonstrates how to use Python to crawl Website pages via sitemap.xml and check broken links for every page.

Installation

  • pip install beautifulsoup4

Basic Steps

  1. Read sitemap.xml of the target Website.
  2. Search for attribute 'href' to get all valid links in every page.
  3. Check link response code.
  4. Dump 404 error URLs to a text file.

How to Run

  1. Specify sitemap: request = build_request("http://kb.dynamsoft.com/sitemap.xml").
  2. python broken_links.py.
  3. Use ctrl+c to stop the program.

Blog

How to Check Broken Links with 404 Error in Python

You can’t perform that action at this time.