-
Notifications
You must be signed in to change notification settings - Fork 47
0x08. A Simple Web Crawler In Go
Tran Phong Phu edited this page Apr 28, 2019
·
2 revisions
- Design Database for two missions: (1) to store crawled info/data, (2) to monitor/analyze/alert if software does not not work normally.
- Design a crawler which is easy to scale
- Plugin/Extension/Add-on architect to attach new site quickly without downtime
- Avoid to be banned by using proxy
The first thing we do, let's create some structures and interfaces to describle how a crawler will be.
type ICrawler interface {
Parse(res *http.Response) Data
}
type Crawler struct {
selector Selector
parser Parser
}
type Data struct {
Title string
PublishedDate time.Time
Author string
Content string
}
to tell the Parser how extract the content
type Selector struct {
Title string
PublishedDate string
Author string
Content string
}
ⓒ 2019 Phú, Trần Phong & NordicCoder