一个简单的 Python 爬虫系统示例

爬取百度百科 python 词条 1000个

环境

python3

pip install beautifulsoup4

python spider_main.py

如果爬取不了，则百度修改了页面，根据页面修改爬取规则（ html_parser.py 修改规则）

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
html_downloader.py		html_downloader.py
html_outputer.py		html_outputer.py
html_parser.py		html_parser.py
outputer_html		outputer_html
spider_main.py		spider_main.py
url_manager.py		url_manager.py