Using Varnish in front of your Amazon S3 static content
Many startups these days are using Amazon S3 to serve directly their static assets. S3 is being used as a simple CDN instead of more professional (and expensive) solutions (including Amazon’s own CloudFront) because it is very simple and cheap to use. Still if you have a high traffic site, this will no longer be so cheap since you will be paying for all those requests and the bandwidth. In such cases if you still want to use S3 for the storage advantage (like storing millions of files and see it as an unlimited storage space) but not have your bill go up like crazy, you can use a reverse proxy or web accelerator to cache your assets locally and reduce the number of direct hits on S3. We could use Squid or Varnish for this, and in this article I will show how we can configure Varnish for this. We are using varnish with S3 on various projects and it works very well, simplifying the setup and saving a lot of money in the Amazon S3 bill.
Varnish is a state-of-the-art, high-performance HTTP accelerator. It uses the advanced features in Linux 2.6, FreeBSD 6/7 and Solaris 10 to achieve its high performance. I will not go over the installation of varnish here, but I would highly recommend to use the latest version available at this time 2.0.4 as older versions have various issues.
We could try to use something simple like this in a varnish vcl:
backend s3 {
set backend.host = "my_bucket.s3.amazonaws.com";
set backend.port = "80";
}
sub vcl_recv {
if (req.url ~ "\.(css|gif|ico|jpg|jpeg|js|png|swf|txt)$") {
set req.backend = s3;
lookup;
}
}but unfortunately this will not work. The Amazon S3 servers will look into the hostname passed by the request and this will most likely be different than the amazon bucket (something like static.mydomain.com) and hence will return 403 on any such request.
There are several solutions to make this work correctly, and the first one I will present is going to insert the bucket name in the actual url passed to the S3 backed. This looks like:
backend s3 {
set backend.host = "s3.amazonaws.com";
set backend.port = "80";
}
sub vcl_recv {
if (req.url ~ "\.(css|gif|ico|jpg|jpeg|js|png|swf|txt)$") {
set req.url = regsub(req.url, "^", "/my_bucket");
set req.http.host = "localhost";
set req.backend = s3;
lookup;
}
}this will work fine, inserting the bucket name in the actual url passed to the backend. Still I don’t like this solution very much as it changes the consistency between the urls (direct one and the forwarded one) so here is a much better solution:
backend s3 {
set backend.host = "s3.amazonaws.com";
set backend.port = "80";
}
sub vcl_recv {
if (req.url ~ "\.(css|gif|ico|jpg|jpeg|js|png|swf|txt)$") {
set req.http.host = "my_bucket.s3.amazonaws.com";
set req.backend = s3;
lookup;
}
}As we can see, we are setting the http host the the one Amazon S3 servers would expect for our bucket. So we can keep the same url and don’t mess with the actual link we are passing.
A complete varnish vcl configuration to use with the Amazon S3 backend might look like this:
backend s3 {
.host = "s3.amazonaws.com";
.port = "80";
}
sub vcl_recv {
if (req.url ~ "\.(css|gif|ico|jpg|jpeg|js|png|swf|txt)$") {
unset req.http.cookie;
unset req.http.cache-control;
unset req.http.pragma;
unset req.http.expires;
unset req.http.etag;
unset req.http.X-Forwarded-For;
set req.backend = s3;
set req.http.host = "my_bucket.s3.amazonaws.com";
lookup;
}
}
sub vcl_fetch {
unset obj.http.X-Amz-Id-2;
unset obj.http.X-Amz-Meta-Group;
unset obj.http.X-Amz-Meta-Owner;
unset obj.http.X-Amz-Meta-Permissions;
unset obj.http.X-Amz-Request-Id;
set obj.ttl = 1w;
set obj.grace = 30s;
} If you found this post interesting, stay tuned for future posts on varnish and how to use it in more complex setups
.
>
6th August 2009, 17:46
[...] your interested in using Amazon’s S3 to server static content, check out “Using Varnish in front of your Amazon S3 static content” by Marius over at [...]
5th September 2009, 07:06
[...] Shared Using Varnish in front of your Amazon S3 static content | MDLog:/sysadmin [...]
9th September 2009, 15:47
Are you sure that this setup really cache files coming from S3 ?
In different cases S3 will return 302 Redirect responses which, instead
of being handled by varnish, are sent directly to the client. This
results in the client bypassing varnish for that particular query.
If you lookup “Handling Dynamic Backend Redirects” on google, you’ll
find a mail from Poul-Henning Kamp, varnish’s main author, which
basically says that varnish cannot understand such redirects.
Thanks for your comments on this issue.
10th September 2009, 07:14
@François: what do you mean by “different cases S3 will return 302 Redirect responses”. what particular cases are you talking about?
14th September 2009, 08:46
@Marius: The description I wrote was not entirely correct, it’s actually
307 temporary redirects which are sometimes returned by S3. According to
their documentation, this could happen in case of routing changes in
their infrastructure.
http://docs.amazonwebservices.com/AmazonS3/2006-03-01/Redirects.html
It’s probably not a big deal if you only use Varnish to provide faster
access to your files stored on S3. But, in my case, it’s also used to
ensure that incoming requests have a valid Referrer header to prevent
hotlinking.