You might want to read these in order. This article is part of Glacier series. Other articles in the series:
You can also see all articles in Glacier series or subscribe to the RSS feed for this series.
There's alternative implementation of Restic rewritten from scratch in Rust - Rustic. It offers some extra neat features. One such feature is support for cold storage. This blogpost started as my research notes to figure out whether it makes financial and technical sense to use this. I've ended up writing two articles in the series. This part focuses on economic (both financial and complexity) cost of such solution.
This is definitely something to consider. Quoting the Readme:
rustic currently is in beta state and misses regression tests. It is not recommended to use it for production backups, yet.
So perhaps don't migrate to rustic just yet with your very important production data. The repository format is pretty stable as it's the same for both Rustic and Restic, but the Rust implementation is quite recent and not as battle tested.
Cold storage support in Rustic is implemented in pretty clever way. There are two repositories, one is used to store just the πΆοΈ"hot" data (snapshots, index,..) the other βοΈ "cold" storage holds all of the data and it's normally write-only until you actually need to restore some files.
This is because typically reading from cold data storage incurs extra cost and it's often very slow to actually get the data back. (More on that later)
What that means is you need to manage multiple buckets, you need to always use them for backups and you need to make sure that both cold and hot storage is using the correct storage tier. You also need to configure warm-up command for the restoration to work. In other words, the setup isn't that simple.
On top of that you need to consider the restore limitations. Besides cost (more on that later) there are some delays involved. To provide some examples, S3 Glacier Flexible Retrieval can restore objects in couple of minutes or up to 12 hours depending on how much you're willing to pay to expedite the process. With S3 Glacier Deep Archive the restoration times are measured in hours and for cost effective restoration it will be days before you can restore your backups.β° If you need to restore ASAP, perhaps the lowest tiers of cold storage aren't for you.
First of all I'm not trying to single out Wasabi. It's just something I have experience with and it is also a good example of affordable cloud storage.
Estimating cost with something like Wasabi is very straightforward. As of writing of this article they charge $6.99 TB/month
, so if you want to store 2TB
worth of backups, you'll be paying around $14/month
. (depending on how well rustic manages to compress the data and on the overhead) There is no extra payment to upload the data, nor is there any payment for restoration. So the only thing to keep in mind is that Wasabi charges minimum of 90 days per object so perhaps don't prune too early and their minimum storage charge in case you want to store significantly less than 1TB.
For AWS I'm going to focus on S3 Glacier Deep Archive as both the savings and the extra costs are most pronounced here. I'm using the eu-west-1
region, keep that in mind - the prices do differ. The prices are as of writing this article. There are other providers of cold storage with different parameters, do your own research. π¬ Also do your own math, don't rely on my poor skills.
To actually migrate to cold storage, we need to upload the data. The data moved from Internet to AWS are free, but we need to PUT
them there so we're still going to pay for the requests made. How many requests will we have to do? Since we're going to do migration we know exactly how many files we have. Let's look at rustic repo-info
output, more specifically the first table is helpful here:
File type | Count | Total Size |
---|---|---|
Key | 1 | 450 B |
Snapshot | 180 | 55.9 kiB |
Index | 169 | 327.1 MiB |
Pack | 349433 | 1.9 TiB |
Total | 349783 | 1.9 TiB |
With about 350k files and price at $0.05/1k requests
, we're looking at $0.05 * 350 = $17.50
. There is some extra overhead for storing some files also to the hot storage, but given the limited set of files it holds and much cheaper request price of standard tier ($0.005/1k req.
), this is likely very negligible cost. Let's just round it up to $18
.
On top of that if you're at the same time moving from restic to rustic, the default pack size in rustic is much bigger so you're likely ending up with much fewer objects in the new repository. (I have observerd about 10x reduction personally.) I'm going to assume 350k
objects going forward, as it doesn't change the overall math that much, just keep it in mind that it could be less.
The storage costs is just about $1/TB/month
so about $2
to store 2TB
worth of backups. There's also some extra overhead directly on the AWS side. (see S3 pricing page for details) Per object there's:
32k
of Glacier Deep Archive storage overhead - for S3 Glacierβs index and metadata8 KB
Standard storage overhead - for user-defined name and metadataTogether they are just under 10c
in storage costs for my repository. The average object size of Restic repository is in megabytes, so it makes sense few kB on top of that does not change overall cost that much.
There's also overhead of storing copy of the "hot" data into the hot bucket at regular storage prices. This is where the second table generated by rustic repo-info
comes handy:
Blob type | Count | Total Size | Total Size in Packs |
---|---|---|---|
Tree | 912505 | 3.6 GiB | 3.6 GiB |
Data | 1930596 | 1.9 TiB | 1.9 TiB |
Total | 2843101 | 1.9 TiB | 1.9 TiB |
Index and Snapshot from the first table are less than 400MB
, Tree from the second is 3.6GiB
, so about 4GiB
altogether in hot storage. At $0.023 per GB/month
that's another 10c
. Also not a significant portion of storage costs.
Summed up at about $2.20/month
and compared to Wasabi it's just over $11
saved per month. We should recoup the upfront cost in under 2 months.
There's also minimum storage duration of 180 days. Objects deleted prior to the minimum storage duration incur a pro-rated charge equal to the storage charge for the remaining days. Generally not a concern for backups, those are usually kept much longer. Also Rustic by default won't repack backups when pruning unless forced with --repack-cacheable-only=false
for this exact reason. With cold storage pricing it's usually cheaper to keep a bit more data around.
So the ongoing cost of cold storage feels pretty pleasant and quite straightforward actually. However here math gets quite unpredictable.
Let's do worst case scenario where we lost all of the data and we want to restore everything from backups ASAP.
First we need to ask AWS to restore the data from cold archive. This is $0.10/1k requests
for faster Standard retrieval (within 12h) or $0.025
Bulk (within 48h) retrieval. So with ~350k
objects, that's $35
or $8.75
depending on how urgent this is. This gives us readable objects in S3.
We also need to tell AWS for how long should the objects be restored - standard storage rate applies while in this state. That's another $1.50/day
while you're fetching the data.
To actually download the backups, the rate is $0.09 per GB
- that's about $185
in data transfer fees for our 2TB backup. (Worst case scenario, restoring out to Internet)
The GET
requests are cheap at $0.0004 / 1k requests
- so 14c
total for our repository. Rounding error compared to the transfer costs.
Summed up the restore cost is somewhere between $230
and $200
depending on urgency. About 20 months worth of saved storage costs. Or in other words, we'd have to average about 2 years
between restores to break even as compared to Wasabi.
The above is quite pessimistic approach, but it gives us sort of ceiling in costs involved for disaster recovery. Depending on what and how you back up, this also might not be very realistic estimate.
First of all:
AWS customers receive 100GB of data transfer out to the internet free each month
This is aggregated across all services and regions. But if you only use AWS for backups you have most of that allocation available for occasional restore. If you need to restore within the same AWS region (for example to an instance in AWS) that's also free of data transfer costs, you just pay the retrieval requests.
You just lost all of the family photos, do you need to restore 1TB of them or do you just need few GB from last year to compile nice pictures for family calendar? You can see where this is going. Does your small company need all of the data instantly or just files for current ongoing projects?
In my experience, looking back I don't think I've restored 100GB aggregated across about last 5 years that I can recall. Most of the time I've got the lost data back from earlier local snapshot or some other copy. I can only recall one case of silent data corruption where it went unnoticed for long enough I actually had to reach out for backups. As you can imagine it also wasn't particularly urgent.
To put it other way $230
might be the cost of restoring everything, but it's also amount of money that will keep the files stored for about a decade. So unless you need the files instantly, you have the option to decide later what is important to restore.
As you can see, it's not just the per-gigabyte price that needs to be compared when choosing your storage provider. There are many more factors to consider. Sometimes cheaper option might be more expensive in the long run, sometimes paying upfront will save you cost down the line.
The unique property of S3 Glacier Deep Archive is that it lets you store data for very reasonable price and you only have to decide later whether the cost of restoring data is actually appropriate. Which is not bad proposition.
After all, the most expensive backups are the ones you don't have but really wish you did.
This article is part of Cloud category. Last 2 articles in the category:
You can also see all articles in Cloud category or subscribe to the RSS feed for this category.