···11+# Chunk
22+33+`chunk` is a sort of download manager written in pure Go. The idea of the project emerged as it was difficult for [Minha Receita](https://github.com/cuducos/minha-receita) to handle the download of [37 files that adds up to just approx. 5Gb](https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/cadastros/consultas/dados-publicos-cnpj). Most of the download solutions out there (e.g. [`got`](https://github.com/melbahja/got)) seem to be prepared for downloading large files, not for downloading from slow and unstable servers — which is the case at hand.
44+55+## Main fetaures
66+77+### Download using HTTP range requests
88+99+In order to complete downloads from slow and unstable servers, the download should be done in “chunks” using [HTTP range requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests). This does not rely on long-standing HTTP connections, and it makes it predictable the idea of how long is too long for a non-response.
1010+1111+### Retries by chunk, not by file
1212+1313+In order to be quicker and avoid rework, the primary way to handle failure is to retry that “chunk” (that bytes range), not the whole file.
1414+1515+### Control of which chunks are already downloaded
1616+1717+In order to avoid re-starting from the beginning in case of non-handled errors, `chunk` knows which ranges from each file were already downloaded; so, when restarted, it only downloads what is really needed to complete the downloads.
1818+1919+### Detect server failures and give it a break
2020+2121+In order to avoid unnecessary stress on the server, `chunk` relies not only on HTTP responses but also on other signs that the connection is stale and can:
2222+2323+1. recover from that and
2424+2. give the server some time to recover from stress.
2525+2626+## Tech design
2727+2828+### Input
2929+3030+* List of URLs
3131+* Directory where to save the files
3232+* Configuration (they can have defaults and be optional; customizing them can be a stretch goal):
3333+ * Chunck download attempt timeout
3434+ * Maximum parallel connection to each server
3535+ * Max retries per chunk (must have an option to unlimited)
3636+ * Range maximum size (chunk size)
3737+ * Time to wait on server failure
3838+3939+### Prepare downloads
4040+4141+For each URL of the list (this can be done in parallel):
4242+4343+* Make sure the server accepts HTTP range requests (stretch goal)
4444+ * Can fail if it doesn't
4545+ * Or can default to regular HTTP request to download
4646+* Find out the file total size
4747+* Determine all the chunks to be downloaded (each start and end bytes)
4848+* Read or create a temporary control of chunks downloaded and pending chunks
4949+* Enqueue all the pending chunks
5050+5151+With all this information, show a progress bar with the total work remaining.
5252+5353+### Download
5454+5555+* Set a timeout
5656+* Start the HTTP range request
5757+* In case of failure or timeout, re-queue this chunk
5858+* In case of success, send the chunk contents to a `results` channel
5959+6060+### Writing files
6161+6262+* Read the bytes from the `results` channel
6363+* Write to the file on disk
6464+* Update a progress bar to give the user an idea about the status of the downloads