Skip to content
/ chunk Public
forked from cuducos/chunk

🧱 Chunk is a download manager for slow and unstable servers

License

Notifications You must be signed in to change notification settings

vmesel/chunk

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chunk

chunk is a sort of download manager written in pure Go. The idea of the project emerged as it was difficult for Minha Receita to handle the download of 37 files that adds up to just approx. 5Gb. Most of the download solutions out there (e.g. got) seem to be prepared for downloading large files, not for downloading from slow and unstable servers — which is the case at hand.

Main fetaures

Download using HTTP range requests

In order to complete downloads from slow and unstable servers, the download should be done in “chunks” using HTTP range requests. This does not rely on long-standing HTTP connections, and it makes it predictable the idea of how long is too long for a non-response.

Retries by chunk, not by file

In order to be quicker and avoid rework, the primary way to handle failure is to retry that “chunk” (that bytes range), not the whole file.

Control of which chunks are already downloaded

In order to avoid re-starting from the beginning in case of non-handled errors, chunk knows which ranges from each file were already downloaded; so, when restarted, it only downloads what is really needed to complete the downloads.

Detect server failures and give it a break

In order to avoid unnecessary stress on the server, chunk relies not only on HTTP responses but also on other signs that the connection is stale and can:

  1. recover from that and
  2. give the server some time to recover from stress.

Tech design

Input

  • List of URLs
  • Directory where to save the files
  • Configuration (they can have defaults and be optional; customizing them can be a stretch goal):
    • Chunck download attempt timeout
    • Maximum parallel connection to each server
    • Max retries per chunk (must have an option to unlimited)
    • Range maximum size (chunk size)
    • Time to wait on server failure

Prepare downloads

For each URL of the list (this can be done in parallel):

  • Make sure the server accepts HTTP range requests (stretch goal)
    • Can fail if it doesn't
    • Or can default to regular HTTP request to download
  • Find out the file total size
  • Determine all the chunks to be downloaded (each start and end bytes)
  • Read or create a temporary control of chunks downloaded and pending chunks
  • Enqueue all the pending chunks

With all this information, show a progress bar with the total work remaining.

Download

  • Set a timeout
  • Start the HTTP range request
  • In case of failure or timeout, re-queue this chunk
  • In case of success, send the chunk contents to a results channel

Writing files

  • Read the bytes from the results channel
  • Write to the file on disk
  • Update a progress bar to give the user an idea about the status of the downloads

Prototype

The prototype is a CLI that wraps a GET HTTP request in a 45s timeout independent of the HTTP client's timeout. It also includes 3 retries by default.

$ go run main.go <URL> # e.g. go run main.go https://github.com/cuducos/chunk

The API should work like this:

// simple use case
d := NewDownloader()
ch := d.Dowload(urls)

// partial customization
d := NewDownloader()
d.MaxRetriesPerChunk = 42
ch := d.Dowload(urls)

// full control
d := chunk.Downloader{...}
ch := d.Download(urls)

The resulting channel will transmit data about each download:

type DownloadStatus struct {
	URL                 string
	DownloadedFilePath  string
	FileSizeBytes       uint64
	DownloadedFileBytes uint64
	Error               error
}

About

🧱 Chunk is a download manager for slow and unstable servers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 100.0%