Many startups these days are using Amazon S3 to serve directly their static assets. S3 is being used as a simple CDN instead of more professional (and expensive) solutions (including Amazon’s own CloudFront) because it is very simple and cheap to use. Still if you have a high traffic site, this will no longer be so cheap since you will be paying for all those requests and the bandwidth. In such cases if you still want to use S3 for the storage advantage (like storing millions of files and see it as an unlimited storage space) but not have your bill go up like crazy, you can use a reverse proxy or web accelerator to cache your assets locally and reduce the number of direct hits on S3. We could use Squid or Varnish for this, and in this article I will show how we can configure Varnish for this. We are using varnish with S3 on various projects and it works very well, simplifying the setup and saving a lot of money in the Amazon S3 bill.
Varnish is a state-of-the-art, high-performance HTTP accelerator. It uses the advanced features in Linux 2.6, FreeBSD 6/7 and Solaris 10 to achieve its high performance. I will not go over the installation of varnish here, but I would highly recommend to use the latest version available at this time 2.0.4 as older versions have various issues.
We could try to use something simple like this in a varnish vcl:
backend s3 {
set backend.host = "my_bucket.s3.amazonaws.com";
set backend.port = "80";
}
sub vcl_recv {
if (req.url ~ "\.(css|gif|ico|jpg|jpeg|js|png|swf|txt)$") {
set req.backend = s3;
lookup;
}
}
Read the rest of this entry »
Tags: caching, proxy, varnish
The moment a PHP application grows to run on more servers, normally people will see problems caused by PHP sessions. If the application is not persistent you are lucky and don’t care about this, but if not you will quickly see this regardless of how good the load balancer you use is handling stickiness (sending the users to the same real server), this will slowly become a major issue. There are various solutions that can be used to store PHP sessions in a shared location, but I want to present today one solution that is very simple to implement, yet very efficient and on the long term better suited than using a database backend for this: using memcache to store the sessions.
The pecl memcache php extension has supported for a long time the memcache session.save_handler, but with the release 3.0.x (still in beta at this time) this brings in a set of interesting features for us:
- UDP support
- Binary protocol support
- Non-blocking IO using select()
- Key and session redundancy (values are written to N mirrors)
- Improved error reporting and failover handling
Read the rest of this entry »
Tags: memcached, pecl, php5, php_extensions, php_modules
The InnoDB Team just released the InnoDB Plugin version 1.0.3. From their announcement here are the main points of this release:
- Enhanced scalability: the Google SMP enhancement for synchronization
- More efficient memory allocation: ability to use platform allocator tuned for multi-core systems
- Improved out-of-the-box scalability: unlimited concurrent thread execution by default
- Dynamic tuning: at run-time, enable or disable insert buffering and adaptive hash indexing
wow… now this is indeed some great news for innodb users… I am writting this, and still I can’t believe that they’ve included the Google SMP patch in their official release. I can only assume that alternative projects as XtraDB, Drizzle, Percona patches, Google patches, etc. made Oracle to look back and try to do something with innodb besides the regular bug fixes. Even if we already use several of the great ‘unofficial alternatives’ this is good news for everyone.
Way to go Oracle! and looking forward for future performance improvements in the official innodb plugin; including existing patches that are out there already for sometime is a good start, but internal improvements from the innodb team would be also great
.
Here are some performance results based on their own tests:
http://www.innodb.com/innodb_plugin/plugin-performance/
Tags: innodb, mysql, mysql-5.1
Today Amazon announced the public beta of Amazon CloudFront, their AWS service for content delivery. This is the service that many users of Amazon S3 (Simple Storage Service) have been waiting for a long time. Even if S3 was never a ‘real’ CDN (content delivery network) it was used by many sites to serve static content. The main limitation of this approach was that it had no geographical awareness as content delivery networks usually have; the fact that S3 is highly scalable and well priced made this solution acceptable on S3.
CloudFront is the answer to all users’ requests about using S3 as a CDN, delivering the content using a global network of 14 edge locations. CloudFront uses S3 to store the original file, and caches copies of the content close to end users locations, lowering latency when they download the objects.
Read the rest of this entry »
Tags: amazon, aws, cdn, Cloud Computing, CloudFront, s3
LVS has a simple IP based persistence built-in that can be used to keep the users on the same real servers for a configurable amount of time. This has been explained in my previous post, and it works fine, but in real life users will come from various dynamic connections or even using some ISP proxy servers to browse the internet. For such situations LVS supports the configurable netmask for persistence, allowing us to increase the network mask used in the persistence match (normally we will use /24 for this) sending a bigger range of ips to the same server. This approach works fine for most cases where users will have the same class C ips allocated or the isp proxies will be on the same network range. Unfortunately this doesn’t work for AOL, because the AOL clients will always be proxied by the huge AOL proxy cluster that will send each request from a different real ip. These IPs are not even from the same range and tend to be completely different. This post will show how we can keep these AOL users on the same real server in a LVS-DR setup.
Normally if this would have been a small ISP I am sure people would have ignored their users and the users would have complained back to the ISP that they can’t reach some big sites, and in the end the ISP would have found a friendlier solution for this. Since this is AOL and they have a huge base of clients, we can’t really ignore them and we have to find a solution ourselves.
Read the rest of this entry »
Tags: aol, ipvs, ldirectord, load_balancing, lvs, Scaling
LVS has a built-in simple IP based persistence mechanism that can be used to keep users on the same real servers for a configurable amount of time. If your web application requires that each user request to be processed by the same real server then you will probably want to enable this mechanism and ensure that requests coming from the same IP will be directed to the same real server. This article will show how you can achieve this by using regular ipvsadm commands but also by using ldirectord configurations.
IPVS is an advanced IP based load balancing application implemented inside the linux kernel. Working at IP level LVS can’t make a decisions based on the content of the packet. Still, it can perform a basic IP affinity, by keeping all connections from the same source IP directed to the same real server for a configurable amount of time. This is achieved with the -p ipvsadm command parameter and takes as a parameter the time in seconds to keep the connections in the persistence table. Read the rest of this entry »
Tags: ipvs, ldirectord, load_balancing, lvs, Scaling