3 Tools for downloading data
Note: this page is under construction (there are no complete sections).
3.1 Command-line utilities
3.1.2 wget / wget2
GNU wget and its successor wget2 are command-line download utilities supporting a wide array of protocols and with many helpful features including recursive retrieval, WARC output, URL parameters and POST data, rate limiting, and automatic retries.
wget is designed to run non-interactively; after specifying your arguments (retrieval options) and invoking the wget commmand, it will run unsupervised until all specified URLs have been retrieved (and child URLs, if retrieving recursively).
Installation instructions:
- Windows: Through the Windows Subsystem for Linux.
- macOS: Through Homebrew (
brew install wget) or by compiling. - Linux: Generally included by default; if not, consult your distribution’s package repository.
Usage
Useful command-line arguments
For an exhaustive list, see the wget documentation or run man wget or wget --help.
Examples
3.1.3 curl
curl is another common data transfer tool that is much better suited for interactive use (for example to explore APIs, or in scripts that parse server responses) and supports a much wider array of protocols. Although it can also be used as a standalone download utility, it does not support recursive retrieval.
Installation instructions:
- Windows: Through the Windows Subsystem for Linux.
- macOS: Included by default.
- Linux: Generally included by default; if not, consult your distribution’s package repository.
Usage
Useful command-line arguments
For an exhaustive list, see the curl man page or run man curl or curl --help all.
Examples
3.1.4 aria2
aria2 is another command-line download tool similar to wget that supports fewer protocols but may be preferable for downloading a large number of files for which the URLs are known. Unlike wget, aria2 marks incomplete or errored downloads as such (allowing for automatic retries on next run and for you to quickly check file integrity at a glance) and has better support for concurrency with URL-based, domain-based, and chunk-based parallelism.
Installation instructions:
- Windows: Binaries available from the aria2 website.
- macOS: Through Homebrew (
brew install aria2), or from the aria2 website. - Linux: Consult your distribution’s package repository.
Usage
Examples