Filesystem over S3 : s3backer

Note : This note is a reprint of an article from my personal blog, with minor modifications.

What is S3?

Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network. It can be employed to store any type of object which allows for uses like storage for Internet applications, backup and recovery, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage.

Amazon S3 manages data with an object storage architecture which aims to provide scalability, high availability, and low latency with so called 11-9s durability, that is, 99.999999999% durability.

Wide adoption of the Amazon S3 has spawned many competing offerings which implement cloud storage using an API compliant with S3. Some of these S3 compatible implementations are provided by :

  • Apache CloudStack
  • DigitalOcean Spaces
  • OpenIO
  • Wasabi

In my recent experiments to settle on an efficient way to expand the storage on offer for my server, I discovered s3backer.

The Problem

Storage in S3 happen via “objects” in “buckets”. Each object is at least 4 kB in size, with no upper limit as far as I know. But objects cannot be updated. Suppose you store myFile.odt as an object on your S3 cloud. If you then edit the file, and want to store the changes on your S3, it will first discard the copy it already has, and require you to push the entire file again to cloud. While this is not a problem for read only data, this quickly adds a lot of overhead to “hot data” : data that is read from and written to often. Not only is this method wasteful for repeated rewrites of a file, in my experiments writing big files directly to S3 (over S3FS) was also painfully slow.

The Solution

Presenting s3backer. This neat little program can connect to S3 compatible storage, and treat a bucket like a physical storage device. It abstracts the S3 away as some disk of a chosen block size ( 128kb is the recommended to get started with ), and presents it as a large disk image on your system. You can then format this disk image with a filesystem of your choice, and mount it as if it were just a regular storage device.

At this point, all of your applications are blind to the fact that this newly mounted storage is backed by S3, and all operations supported. Since this is a regular filesystem, all kinds of disk caching the kernel performs for regular drives are also applied here, further increasing performance. You circumvent the problem of having to rewrite entire objects on the S3 because of the fact that all of your files are split into objects of 128kb each, and only the parts of the files modified by your program will need to be reuploaded.

The Drawbacks

Not a problem in my use case, but it should be mentioned that using a bucket with S3 makes it incompatible with simultaneous read/write mounts. Moreover, it offers no safeguard against such a situation. It is up to the end user to make sure this does not happen.

Because of the way s3backer works, data in a bucket can only meaningfully be accessed via s3backer, or something similar. You will not be able to access your files via webview, S3FS, or other methods.

Installation and Usage

For Debian, there are no prebuilt package for s3backer. As a result, for Debian, you must download the latest release code, and compile it yourself. Link to the github page is provided at the end of the article. The build dependencies are provided as an apt gettable list:

libcurl4-openssl-dev libfuse-dev libexpat1-dev libssl-dev zlib1g-dev pkg-config autoconf automake

Then, installation is pretty standard.

$ ./configure
$ make
$ sudo make install

Once s3backer is installed, you will need to create two mountpoints. The first, for the program to connect to S3 and mount a bucket of your choosing on your filesystem. The second, to mount the disk image from the first mountpoint as a loop device. Call them s3-b and s3-fs.

It is then possible to mount a demo bucket provided on the official wiki page , just to see how it works.

$ s3backer --readOnly s3backer-demo s3-b
$ sudo mount -o loop s3-b/file s3-fs

Unmount by first umounting the file system, and then umounting the backing store.

$ sudo umount s3-fs && sudo umount s3-b

Once you are convinced that s3backer works as advertised, you can config it to use your S3 bucket.

  1. Put your accessID and accessKey in a conveniently located file (e.g.: ~/.passwd-s3b) in the format accessID:accessKey
  2. Fire up s3backer to create a backing store in your bucket

    $ s3backer --accessFile=~/.passwd-s3b --blockSize=128k --size=1t --listBlocks mybucket s3-b

    Note : Find the full list of options at the s3backer ManPage

  3. Now create a filesystem pretending that s3-b/file was a block device

    $ mkfs.ext4 s3-b/file

    Note that creating an ext4 filesystem might take a lot of time to initialize the block device. Make sure you use the --listBlocks option otherwise you will incur a ton of network transfer

  4. Finally, mount the filesystem you have just created, and then use it as normal!

    $ sudo mount -o loop s3-b/file s3-fs

s3backer in Action

Files as stored in the filesystem backed by s3
Files as stored in the filesystem backed by s3
Blocks as stored in S3
Blocks as stored in S3
Filesystems as mounted
Filesystems as mounted

archiecobbs/s3backer: FUSE-based single file backing store via Amazon S3

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: