Problem Statment

Given a list of URLS download and contents of URL and store it at a location. Parse the downloaded content for more URLs and keep on downloading till a given depth.

Input

initial list of URLs
depth to which file needs to be downloaded

Output

Downloaded contents of the URLs crawled

Build

Ensure that dependencies are loaded

go mod tidy

Build the Binary

go build -o crawler.bin git.sr.ht/~uknth/crawler/crawler

This creates a binary crawler.bin in the working directory

Run the Binary

./crawler.bin -file="sample/urls.txt" -depth=4 -download=/tmp/crawler -count=5

CLI parameters in the Binary

➜  $:(master) ./crawler.bin --help
Usage of ./crawler.bin:
  -count int
    	worker count (default 4)
  -depth int
    	depth to which the application needs to crawl (default 3)
  -download string
    	download location (default "/tmp/crawler")
  -file string
    	File containing initial list of URls (default "urls.txt")
  -inactivity int
    	default time worker remain idle (default 15)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
crawler		crawler
sample		sample
.gitignore		.gitignore
README.md		README.md
crawler.go		crawler.go
dispatcher.go		dispatcher.go
executor.go		executor.go
go.mod		go.mod
go.sum		go.sum
task.go		task.go
timer.go		timer.go
worker.go		worker.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Statment

Input

Output

Build

Run the Binary

About

Releases

Packages

Languages

uknth/crawler

Folders and files

Latest commit

History

Repository files navigation

Problem Statment

Input

Output

Build

Run the Binary

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages