bs4
(Beautiful Soup) uses a pluggable XML or HTML parser to parse a (possibly invalid) document into a tree representation. bs4
(Beautiful Soup) provides methods and Pythonic idioms that make it easy to navigate, search, and modify the parse tree.
bs4
(Beautiful Soup) works with Python 2.7 and up. It works better if lxml and/or html5lib is installed.
urllib
is a package that collects several modules for working with URLs:
-
urllib.request
for opening and reading URLs -
urllib.error
containing the exceptions raised by urllib.request -
urllib.parse
for parsing URLs -
urllib.robotparser
for parsingrobots.txt
files