GitHub - zzycami/AutoDataGrabber

##网站内容的蜘蛛爬虫程序

###1.关于该程序

为了自动添加网站内容而开发的蜘蛛爬虫程序

###2. 系统的设计

关于网站内容的抓取,我想如果新增加一个网站只要按照一定的规则编写网页分析的逻辑然后放在抓取引擎的文件夹里便可以了.其他程序会自动的调用该目录下的所有网站分析的逻辑代码对各个网站进行抓取.这样的话, 增加一个站点便可以非常方便的只需要编写分析代码,然后放到引擎文件夹中就可以了.

###3.系统的依赖库

程序是python语言版的, 程序支持不同的数据库,你只需要在dbengine下加上你自己的数据库引擎就可以直接使用你的数据库了,数据的基本操作接口都是写在,只要实现那些接口便可以在这里是使用mysql的,因此需要安装python的mysql数据支持库在这里是使用MySQLdb的, 页面的解析使用BeautifulSoup

MySQLdb: 下载地址
BeautifulSoup: 下载地址

###4.程序的定时执行

使用linux上自带的crontab命令也可以实现在某个时间点自动执行这段程序,但是我想要设置成每一个站点的自动执行的时间并不一样.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
damingdan		damingdan
thumb		thumb
webengine		webengine
.gitattributes		.gitattributes
.gitignore		.gitignore
AutoDataGrabber.komodoproject		AutoDataGrabber.komodoproject
README.md		README.md
combine.py		combine.py
config.ini		config.ini
driver.py		driver.py
parser.py		parser.py
send.py		send.py
webImage.py		webImage.py
webimageReceive.py		webimageReceive.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages