Skip to content

0x08. A Simple Web Crawler In Go

Tran Phong Phu edited this page Apr 28, 2019 · 2 revisions

Coding a simple crawler in Go

What are most challenges of a crawler

  • Design Database for two missions: (1) to store crawled info/data, (2) to monitor/analyze/alert if software does not not work normally.
  • Design a crawler which is easy to scale
  • Plugin/Extension/Add-on architect to attach new site quickly without downtime
  • Avoid to be banned by using proxy

Defining core structures and intefaces

The first thing we do, let's create some structures and interfaces to describle how a crawler will be.

Defining Crawler Interface

type ICrawler interface {
	Parse(res *http.Response) Data
}
type Crawler struct {
	selector Selector
	parser   Parser
}

Defining Result

type Data struct {
	Title         string
	PublishedDate time.Time
	Author        string
	Content       string
}

Defining Selector

to tell the Parser how extract the content

type Selector struct {
	Title         string
	PublishedDate string
	Author        string
	Content       string
}