-
Notifications
You must be signed in to change notification settings - Fork 47
02. Week 2 Exercise
Tran Phong Phu edited this page May 15, 2019
·
3 revisions
We will consider to build a crawler using Golang
- Design Database for two missions: (1) to store crawled info/data, (2) to monitor/analyze/alert if software does not not work normally.
- Design a crawler which is easy to scale
- Plugin/Extension/Add-on architect to attach new site quickly without downtime
- Avoid to be banned.
De bo sung them mot website mot cach de dang
Xu ly dong thoi 3 nghiep vu:
- Doc du lieu (url) can crawl tu db 1.1 Tai sao khong lam theo offset va limit. Khi db lon minh, offset ma lon nhu 1M la cau query sai 1.2 La luon order by theo id va lay nhung phan tu lon hon cai lan cuoi lay ra
SELECT *
FROM url
WHERE status = 1 AND state = 1 AND id > 30
ORDER BY id ASC
LIMIT 10 OFFSET 0
1.3 De lock lai cai du lieu khong co thang khac query trung, thi minh lock row db
SELECT FOR UPDATE *
FROM url
WHERE status = 1 AND state = 1 AND id > 30
ORDER BY id ASC
LIMIT 10 OFFSET 0
UPDATE url
SET state = 2
WHERE id IN (1, 2, 3, 4, 5)
LIMIT 5
1.4 Co mot giai phap khac la sharing theo application
- id % 10 = 0 may so 1
- id % 10 = 1 may so 2
- id % 10 = 2 may so 3
SELECT *
FROM url
WHERE status = 1 AND state = 1 AND id > 30 and routing = 1
ORDER BY id ASC
LIMIT 10 OFFSET 0
- (a) Download noi dung html tu url va parse no ra. (b) neu parse bi loi thi` mark du lieu cua buoc 1 la loi.
- (a) Insert noi dung parse dc vao db. (b) Cap nhat cai url tu buoc 1 la DONE 3.1. La doi co du n article moi insert 1 lan (gia su 5) 3.1. Sau thoi gian 3 giay, thi van insert
Vi buoc 2 cac nghiep vu pipeline la can monitor out/in co hop ly hay khong? Neu no khong hop ly thi la code co van de.
Vi du:
- Minh goi ra ngoai la 100 request => ma hien tai no tra ve co 50 => 50 cai dang pending.
- Luc do minh chi can bo sung them 50 cai nua thoi.
ⓒ 2019 Phú, Trần Phong & NordicCoder