Skip to content

webcat12345/brand-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

brand-scraping

Web scrapping from Korean Brand website - http://kdtj.kipris.or.kr/kdtj/searchLogina.do?method=loginTM#page10

Stacks we use

node.js and puppeteer

Usage

  • Clone repository from git
  • npm install to install dependencies
  • npm run start to run node.js server

Data will be saved as brands/[pagenumber].pdf

Challenge points

  • Pagination skip every 10 pages
  • Wait for image downloaded to the browser cache

TODO

  • Start from specific page
  • Error handling - extraction failed should stop process and notify to user