···33[](https://github.com/cuducos/chunk/actions/workflows/tests.yaml)
44[](https://github.com/cuducos/chunk/actions/workflows/gofmt.yaml)
55[](https://github.com/cuducos/chunk/actions/workflows/golint.yaml)
66+[](https://godoc.org/github.com/cuducos/chunk)
6778Chunk a download tool for slow and unstable servers.
8999-The idea of the project emerged as it was difficult for [Minha Receita](https://github.com/cuducos/minha-receita) to handle the download of [37 files that adds up to just approx. 5Gb](https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/cadastros/consultas/dados-publicos-cnpj). Most of the download solutions out there (e.g. [`got`](https://github.com/melbahja/got)) seem to be prepared for downloading large files, not for downloading from slow and unstable servers — which is the case at hand.
1010-1111-## Main fetaures
1212-1313-### Download using HTTP range requests
1414-1515-In order to complete downloads from slow and unstable servers, the download should be done in “chunks” using [HTTP range requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests). This does not rely on long-standing HTTP connections, and it makes it predictable the idea of how long is too long for a non-response.
1616-1717-### Retries by chunk, not by file
1010+## Usage
18111919-In order to be quicker and avoid rework, the primary way to handle failure is to retry that “chunk” (that bytes range), not the whole file.
2020-2121-### Control of which chunks are already downloaded
1212+### CLI
22132323-In order to avoid re-starting from the beginning in case of non-handled errors, `chunk` knows which ranges from each file were already downloaded; so, when restarted, it only downloads what is really needed to complete the downloads.
1414+Install it with `go install github.com/cuducos/chunk` then:
24152525-### Detect server failures and give it a break
1616+```console
1717+$ chunk <URLs>
1818+```
26192727-In order to avoid unnecessary stress on the server, `chunk` relies not only on HTTP responses but also on other signs that the connection is stale and can:
2020+Use `--help` for detailed instructions.
28212929-1. recover from that and
3030-2. give the server some time to recover from stress.
2222+### API
31233232-## Tech design
2424+The [`Download`](https://pkg.go.dev/github.com/cuducos/chunk#Download) method returns a channel with [`DownloadStatus`](https://pkg.go.dev/github.com/cuducos/chunk#DownloadStatus) statuses. This channel is closed once all downloads are finished, but the user is in charge of handling errors.
33253434-### Input
2626+#### Simplest use case
35273636-* List of URLs
3737-* Directory where to save the files
3838-* Configuration (they can have defaults and be optional; customizing them can be a stretch goal):
3939- * Chunck download attempt timeout
4040- * Maximum parallel connection to each server
4141- * Max retries per chunk (must have an option to unlimited)
4242- * Range maximum size (chunk size)
4343- * Time to wait on server failure
2828+```go
2929+d := chunk.DefaultDownloader()
3030+ch := d.Dowload(urls)
3131+```
44324545-### Prepare downloads
3333+#### Customizing some options
46344747-For each URL of the list (this can be done in parallel):
3535+```go
3636+d := chunk.DefaultDownloader()
3737+d.MaxRetries = 42
3838+ch := d.Dowload(urls)
3939+```
48404949-* Make sure the server accepts HTTP range requests (stretch goal)
5050- * Can fail if it doesn't
5151- * Or can default to regular HTTP request to download
5252-* Find out the file total size
5353-* Determine all the chunks to be downloaded (each start and end bytes)
5454-* Read or create a temporary control of chunks downloaded and pending chunks
5555-* Enqueue all the pending chunks
4141+#### Customizing everything
56425757-With all this information, show a progress bar with the total work remaining.
4343+```go
4444+d := chunk.Downloader{...}
4545+ch := d.Download(urls)
4646+```
58475959-### Download
4848+## How?
60496161-* Set a timeout
6262-* Start the HTTP range request
6363-* In case of failure or timeout, re-queue this chunk
6464-* In case of success, send the chunk contents to a `results` channel
5050+It uses HTTP range requests, retries per HTTP request (not per file), prevents re-downloading the same content range and supports wait time to give servers time to recover.
65516666-### Writing files
5252+### Download using HTTP range requests
67536868-* Read the bytes from the `results` channel
6969-* Write to the file on disk
7070-* Update a progress bar to give the user an idea about the status of the downloads
5454+In order to complete downloads from slow and unstable servers, the download should be done in “chunks” using [HTTP range requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests). This does not rely on long-standing HTTP connections, and it makes it predictable the idea of how long is too long for a non-response.
71557272-## Prototype
5656+### Retries by chunk, not by file
73577474-The prototype is a CLI that wraps a GET HTTP request in a 45s timeout independent of the HTTP client's timeout. It also includes 3 retries by default.
5858+In order to be quicker and avoid rework, the primary way to handle failure is to retry that “chunk” (content range), not the whole file.
75597676-```console
7777-$ go run main.go <URL> # e.g. go run main.go https://github.com/cuducos/chunk/archive/refs/heads/main.zip
7878-```
6060+### Control of which chunks are already downloaded
79618080-The API should work like this:
6262+In order to avoid re-starting from the beginning in case of non-handled errors, `chunk` knows which ranges from each file were already downloaded; so, when restarted, it only downloads what is really needed to complete the downloads.
81638282-```go
8383-// simple use case
8484-d := NewDownloader()
8585-ch := d.Dowload(urls)
6464+### Detect server failures and give it a break
86658787-// partial customization
8888-d := NewDownloader()
8989-d.MaxRetriesPerChunk = 42
9090-ch := d.Dowload(urls)
6666+In order to avoid unnecessary stress on the server, `chunk` relies not only on HTTP responses but also on other signs that the connection is stale and can recover from that and give the server some time to recover from stress.
91679292-// full control
9393-d := chunk.Downloader{...}
9494-ch := d.Download(urls)
9595-```
6868+## Why?
96699797-The resulting channel will transmit data about each download:
7070+The idea of the project emerged as it was difficult for [Minha Receita](https://github.com/cuducos/minha-receita) to handle the download of [37 files that adds up to just approx. 5Gb](https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/cadastros/consultas/dados-publicos-cnpj). Most of the download solutions out there (e.g. [`got`](https://github.com/melbahja/got)) seem to be prepared for downloading large files, not for downloading from slow and unstable servers — which is the case at hand.
98719999-```go
100100-type DownloadStatus struct {
101101- URL string
102102- DownloadedFilePath string
103103- FileSizeBytes uint64
104104- DownloadedFileBytes uint64
105105- Error error
106106-}
107107-```