Skip to main content

July 28

Hi everyone! Hope you're doing okay :) A lot to talk about, had quite an adventure of sorts figuring out some web service stuff XD

Most of my focus in the last few weeks has been getting image delivery right. To start with, first I'll explain the problem.

For an art website, there are many images to handle; an artist uploads an artwork to your server, and then you need to upload that image to file storage such as Amazon S3. A CDN, like CloudFront, then delivers that image to users viewing the website. The problem comes in because you need to be able to resize the image, often to several different sizes, for example you need to have:

  • An unmodified copy of the original image
  • A 4k version
  • A 1080p version
  • A small, e.g. max 640x480 version for thumbnails

The simple way to do this would be to do all of this resizing on the web server, then upload all the different files to S3. This comes with three problems: 

  • The web server will have a ton of extra processing load which it isn't really suited for
  • The resized images may never actually be viewed, so some of this may be wasted processing
  • ImageMagick and other options available on a typical webserver aren't as efficient as other libraries

The answer to this problem is to use a 'serverless' computing service (it's not actually serverless, it just means you don't have to manage the servers yourself) to resize images as needed. AWS Lambda enables us to create functions that do this - I found some relatively helpful tutorials here:  https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/ and https://blog.stefanolaru.com/on-the-fly-image-resizing-with-aws-lambda-s3-and-cloudfron, both of which go into some of the basics of getting this done.

Doing it this way, image resizing can be done with the extremely efficient Sharp library for Node.js, and only images which are actually requested by users will be processed and resized, saving on both processing costs and S3 storage costs. There is a minor problem, however, for ArtCentral's use case: for this to work, the S3 bucket in which resized images are stored must be fully open to the public to access. 

This is because the mechanism by which this article's solution works, is to use the S3 bucket as a static webhost, allowing it to act as a redirect route when an image in the bucket doesn't exist yet. This means when CloudFront tries to serve a resized image but can't find it in its cache, it goes to S3, and S3 redirects to an API that calls the Lambda function we create, which does the processing and creates the new resized image. This allows CloudFront, S3 and Lambda to work together simply by passing an appropriate URL, but enables two slight problems/vulnerabilities:

  • The S3 bucket, because it's publically accessible, can be accessed without going through CloudFront (so a malicious user, if they found your S3 bucket website address, could rack up significant S3 network costs)

  • Artists often don't want the full-resolution version of their artwork to be publically accessible, and it's not possible with this solution to restrict access to it without also stopping the API that calls AWS Lambda from having access to it.

I got the configuration in the article working, but I also didn't want those vulnerabilities there - so I have found a solution. It involves using two S3 buckets - one bucket for private content that is only downloadable if the artist allows it (full-size images), and one bucket for resized images. In my case, both buckets are closed to public access, but allow access via CloudFront.

The first bucket, which contains private images, can make use of CloudFront's signed URLs feature - allowing access only when ArtCentral servers generate a signed URL for a user to access it with, for a brief period of time. This allows artists to easily switch on and off the "download my full size art" button. The second bucket does not require a signed URL, but only allows CloudFront access - so you can't access the images directly.

With this approach, you can't invoke AWS Lambda functions from the S3 static website endpoint as it will reject any attempts to connect. Instead, we won't use S3 static hosting; we can invoke Lambda on demand directly from the ArtCentral webserver. By keeping records in the user_artworks table showing whether a resized image of a given size exists for an artwork, we can either serve the image directly from CloudFront or invoke Lambda to create the image and then return that through CloudFront. 

This solves the problem well for ArtCentral; no public access to buckets, easy control for artists over their full size files, and no wasted processing time or S3 storage for images that won't be viewed. It also offloads a lot of processing from the webservers to the AWS Lambda infrastructure, which is much more efficient and will also result in fewer delays when submitting artworks (and lower webserver usage overall).

Getting the article config working took a while, and implementing the necessary functions to make the above work will take a while longer; I've established the concept, now I just have to implement it. There's a lot more AWS detail stuff in there, but I left it out for the sake of brevity and not to bore everyone to death (if you do want a longer explanation, feel free to let me know and I can write it).

Once that's all done I can get back to actually implementing the rest of the site. There's still a very long way to go; many site functions work, but there's a lot of joining infrastructure to implement, such as cronjobs and possibly email notifications, plus the notifications structure and stuff. I also need to add some media queries to the CSS sheets so that the site will look alright on mobile devices. That shouldn't be too difficult, it's just a lot of individual things to do.