Published on Fri, 05/20/2016 - 21:48

Backing up to AWS Glacier using the AWS Command Line Interface (tutorial)

Introduction:

In the process of finding a long-term durable back up solution to my personal photos I stumbled upon AWS Glacier, there ]may be a better alternative to my problem - but as I see it Glacier ticks all the boxes.

I am the client, Free services such as Onedrive and Google Drive concern me as if I am not paying them, who is? Could they change their conditions and since I am not exactly the customer, would I have any say in it?
Price, at $0.007 per GB/Month for my use case of 20Gbs is 1.68$ per year. Cheap.
Redundancy, I want to upload and forget about it. Glaciers 99.999999999% annual durability per archive does the trick. Or to put it in other words if you store 100 archives you would expect to lose one every 1,000,000,000 years. Yeah, so i'll be dead most likely, so forget about it.

Introduction to Glacier:

Glacer is a cheap durable backup solution which comes with the trade-off of slow retrieval times. Great when you want to use it to proect from complete data loss. It is worth to note that although Glacier is incredibly cheap per GB of storage, the retrieval process has its own pricing structure, as noted by this blog - you get charged on the peak data retrieval rate per hour, and because the blogger's client crashed, it initiated duplicate retrieval jobs drastically increasing the peak billable rate to $150 for 60GB

Glacier has it's own terminology:

Archives are stored files
Vaults are containers for these files
Inventory is the list of archives within a vault

Using Glacier:

There are GUI Glacier clients, however reading the previous post gave me some trepidation and thought hell, why not just use the CLI?

The reference to the glacier commands can be found here.

Glacier works around jobs that take 4-5 hours to complete, everything apart from uploading needs to be initiated as a job.

The gist of the process is:

configure (aws credentials and choose default region)
create-vault
upload-archive
initiate-job (initiate 4-5 hour job to get archives in the vault, note that they update the inventory list once a day)
list-jobs (optional: to remind yourself of the job id)
get-job-output (get the output of the job which contains the archive id)
initiate-job (initiate retrieval job).
list-jobs (optional: to remind yourself of the job id)
get-job-output (download archive from the previous job)

Run through using the commands

1. configure

$ aws configure
AWS Access Key ID [None]: accesskey
AWS Secret Access Key [None]: secretkey
Default region name [None]: us-west-2
Default output format [None]:

2. create-vault

$aws glacier create-vault --vault-name my-vault --account-id -

3. upload-archive

$aws glacier upload-archive --account-id - --vault-name my-vault --body archive.zip

4. initiate-job

This job gets an inventory listing of your vault, note that they update the inventory once a day so if you upload and then immediately try to get a inventory it may not be on there.

$aws glacier initiate-job --account-id - --vault-name my-vault --job-parameters '{"Type": "inventory-retrieval"}'

5. list-jobs

This command lists all jobs that are either in progress or able to download (they keep completed jobs for roughly 24 hours).

$aws glacier list-jobs --account-id - --vault-name my-vault

6. get-job-output (to download inventory list)

With the job ID from the previous command you can get the inventory list which is in json.

$aws glacier get-job-output --account-id - --vault-name photos --job-id xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx output.json

7. initiate-job (to retrieve archive)

The inventory listing will contain all of the archive id's which can be substituted into this command.

$aws glacier initiate-job --vault-name test --account-id - --job-parameters '{"Type": "archive-retrieval", "ArchiveId": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}'

8. list-jobs (get archive id)

$aws glacier list-jobs --account-id - --vault-name my-vault

9. get-job-output (to download archive)

This final command downloads the output from the retrieval job which will the file you uploaded.

$aws glacier get-job-output --account-id - --vault-name photos --job-id xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx archive.zip

Final notes

Jobs are kept for roughly 24 hours, so this is the amount of time you have to download it
Inventory lists are updated once a day
Honestly, it may be easier just to use the s3 interface with policies as noted here

Blogs of things