How to download Hiro archives
Comprehensive guide for downloading large archive files reliably with troubleshooting tips.
Overview
Hiro Archive files are large datasets (ranging from several GB to several hundred GB) hosted on Google Cloud Storage. Due to their size, downloads can be interrupted by network issues, rate limits, or connection timeouts. This guide provides multiple download methods and troubleshooting solutions.
File sizes and requirements
Before downloading, ensure you have sufficient:
- Disk space: Archives range from 10GB (APIs) to several hundred GB+ (blockchain data)
- Bandwidth: Downloads can take hours or days depending on your connection
- Storage for extraction: Archives expand significantly when extracted
Download methods
Method 1: wget with resume (Recommended for most users)
The wget
command with the -c
flag enables resuming interrupted downloads:
You may need to install wget first: brew install wget
. Alternatively, use the curl method below which is pre-installed on macOS.
$wget -c https://archive.hiro.so/mainnet/stacks-blockchain/mainnet-stacks-blockchain-latest.tar.gz$wget -c --progress=bar:force:noscroll https://archive.hiro.so/mainnet/stacks-blockchain/mainnet-stacks-blockchain-latest.tar.gz
Advantages:
- Resumes interrupted downloads automatically
- Built into most Unix systems
- Simple to use
Disadvantages:
- Single-threaded downloads
- May still experience connection resets
Method 2: curl with retries
Use curl
with automatic retries for robust downloads. The --continue-at -
flag resumes partial downloads, while --output
specifies the filename:
$curl --continue-at - --retry 10 --retry-delay 5 --retry-max-time 0 \--progress-bar \--output mainnet-stacks-blockchain-latest.tar.gz \https://archive.hiro.so/mainnet/stacks-blockchain/mainnet-stacks-blockchain-latest.tar.gz
Advantages:
- Automatic retry mechanism
- Resume capability with
-C -
- More configuration options
Method 3: gcloud storage cp (Fastest, requires authentication)
Google Cloud CLI provides the fastest download speeds with parallel transfers. First authenticate with gcloud auth login
, then either download the file to disk or stream directly to extraction:
Download file to current directory
$gcloud storage cp gs://archive.hiro.so/mainnet/stacks-blockchain/mainnet-stacks-blockchain-latest.tar.gz .
OR stream directly to extraction (saves disk space but slower due to sequential download)
$gcloud storage cp gs://archive.hiro.so/mainnet/stacks-blockchain/mainnet-stacks-blockchain-latest.tar.gz - | tar -xz
Advantages:
- Significantly faster downloads (2-3x speed improvement)
- Built-in parallel transfers
- Automatic retry handling
- Can stream directly to extraction (useful when disk space is limited, but disables parallel transfers)
Disadvantages:
- Requires Google account authentication
- Additional software installation needed
Method 4: Download managers (JDownloader)
For users who prefer GUI applications or need advanced download management:
- 1Download and install JDownloader
- 2Copy the archive URL into JDownloader
- 3Configure parallel connections for faster downloads
Advantages:
- Graphical interface
- Parallel downloading
- Advanced retry mechanisms
- Cross-platform support
Verification and extraction
After downloading, verify the file integrity:
SHA256 checksum files are available for all archives to verify download integrity.
- 1
Download the checksum file:
Terminal$wget https://archive.hiro.so/mainnet/stacks-blockchain/mainnet-stacks-blockchain-latest.sha256 - 2
Verify the download:
Terminal$echo "$(cat mainnet-stacks-blockchain-latest.sha256 | awk '{print $1}') mainnet-stacks-blockchain-latest.tar.gz" | shasum -a 256 -c - 3
Extract the archive:
Terminal$tar -zxvf mainnet-stacks-blockchain-latest.tar.gz -C /target/directory
The marf.sqlite.blobs
file can be very large and may take significant time to extract. Ensure you have sufficient disk space and be patient during extraction.
Performance tips
- 1Use gcloud for fastest downloads - requires authentication but provides significant speed improvements
- 2Download during off-peak hours - typically late night or early morning
- 3Use wired connections - avoid Wi-Fi for large downloads when possible
- 4Monitor disk space - extracted archives can be 2-3x larger than compressed files
- 5Consider streaming extraction with gcloud to save disk space