Questions on backing up to S3 Glacier Deep Archive.

MigratingtoLemmy@lemmy.world to Selfhosted@lemmy.world – 27 points –

Hi everyone. I was considering backup options to Glacier Deep Archive, and wanted to know:

  1. Which software do you use to encrypt client-side, obfuscate, compress and deduplicate the data before you send it to S3?
  2. What is the difference between Restore Requests (bulk) and Outbound data transfer and which one will I be using when I want to pull my data from AWS?

I'll be storing approximately 8TB or so of data, which is why I was looking at inexpensive ways to back it up other than buying an HDD outright.

Thanks!

41

You are viewing a single comment

Lots of answers in the comment about this particular storage type/vendor. Regardless, to answer your original question, rclone. Hands down. If you spend 30-60 minutes actually reading their documentation, you are set and understand so much more of what’s going on under the hood.

Thanks, I do know of rclone and intend to study it. I was just wondering if the likes of borg, duplicati etc could be used.

Can’t speak for those but I tried Kopia and it did the job okay. Ultimately tho I landed on rclone.

Which cloud provider do you use?

I’m currently using Backblaze. I also researched Wasabi and AWS.

How much of storage do you use on B2? Does it not feel quite expensive to you? Even Wasabi is quite expensive, although it's not as bad as AWS.

I was recommended iDrive e2 by another commentor, and now that I look at it, it is likely the best product I have come across other than factoring in reliability. I have never heard of this company before, and considering that this is very important data to me, I'd like to have a reliable company behind it.

Well here’s my very abbreviated conclusion (provided I remember the details appropriately) when I did the research about 3 months ago.

Wasabi - okay pricing, reliable, s3 compatible, no charges to retrieve my data, pay for 1tb blocks (wasn’t a fan of this one), penalty for data retrieval prior to a “vesting” period (if I remember correctly, you had to leave the data there for 90 days before you could retrieve it at no cost. Also not a big fan of this one)

AWS - I’m very familiar with it due to my job, pricing is largely influenced by access requirements (how often and how fast do I want to retrieve my data), very reliable, s3, charges for everything (list, read, retrieve, etc). This is the real killer and largely unaccounted cost of AWS.

Backblaze - okay pricing, reliable, s3 compliant, free retrieval of data up to the same amount that you store with them (read below), pay by the gig (much more flexible than Wasabi). My heartburn with Backblaze was that retrieval stipulation. However, they have recently increased it to free up to 3x of what you store with them which is super awesome and made my heartburn go away really quickly.

I actually chose Backblaze before the retrieval policy change and it has been rock solid from the start. Works seamlessly with the vast majority of utilities that can leverage s3 compliant storage. Pricing wise, I honestly don’t think it’s that bad

Hope this helps

Here's my situation; I anticipate about 8TB that I will need to store reliably.

That's $50 with BackBlaze B2 a month.

I can get 2 12TB drives for $500 total, and keep one/both of them in remote locations (may or may not be connected to the internet, so I suppose the convenience just isn't there like the Cloud).

The supposed value of the cloud is becoming a bit difficult for me to justify TBH. No wonder B2 is reliable, but if I have 2 drives acting as cold storage in different locations (I will be encrypting the contents), is that a better idea than Cloud storage/BackBlaze specifically? I have been assured that the remote locations should be fine for the most part, other than for natural calamities.

Honestly what really matters (imo) is that you do offsite storage. Cloud, a friends house, your parents, your buddy’s NAS, whatever. Just get your data away from your “production/main” site.

For me, I chose cloud for two main reason. First, convenience. I could use a tool to automate the process of moving data offsite in a reliable manner thus keeping my offsite backups almost identical to my main array and easy retrieval should I need it. Second, I don’t really have family or friends nearby and/or with the hardware to support my need for offsite storage.

There are lots of pros and cons of each, let alone add your specific needs and circumstances on top of it.

If you can use the additional drives later on in your main array, some other server or a different purpose then it may be worth while exploring the drives (my concern would be ease of keeping offsite data up to par with main data). If you don’t like it for one reason or the other, you can always repurpose the drives and give cloud storage a try. Again, the important thing is to do it in the first place (and encrypt it client side).

There are 3 main reasons (in my particular scenario) that might prompt me to go for the cloud:

  1. Reliability of infrastructure.
  2. Convenience.
  3. (Supposed) Bitrot protection (I won't have the protection, just the detection, since I'll be using standalone drives with ZFS).

I need to think a bit more. Thanks!