MDLog:/sysadmin

The Journal Of A Linux Sysadmin

Running S3sync in Parallel

| Comments

s3sync is a great tool to synchronize local data with Amazon S3 for backups, or whatever other reasons you might want to put your data on S3. It is very simple to install (gem install s3sync) and use (s3sync -v -s -r –-progress source_dir s3_bucket:dir); it runs very well and it can be easily scripted to do regular backups or even synchronize live data with S3. The only problem I found while using s3sync was that it can be very slow when uploading a lot of data (millions of files) to S3; this because the process is slow but also because it runs a single file at a time, and it doesn’t do several uploads in parallel. I would have loved for s3sync to do this out of the box, but unfortunately it doesn’t, but for my particular need I was able to do this by running more s3sync commands a the same time. It will not apply to your data (unless it is structured the same way as here; very unlikely), but it might give an idea on how you could do this your own data if it is structured in a feasible way.

Using Varnish in Front of Your Amazon S3 Static Content

| Comments

Many startups these days are using Amazon S3 to serve directly their static assets. S3 is being used as a simple CDN instead of more professional (and expensive) solutions (including Amazon’s own CloudFront) because it is very simple and cheap to use. Still if you have a high traffic site, this will no longer be so cheap since you will be paying for all those requests and the bandwidth. In such cases if you still want to use S3 for the storage advantage (like storing millions of files and see it as an unlimited storage space) but not have your bill go up like crazy, you can use a reverse proxy or web accelerator to cache your assets locally and reduce the number of direct hits on S3. We could use Squid or Varnish for this, and in this article I will show how we can configure Varnish for this. We are using varnish with S3 on various projects and it works very well, simplifying the setup and saving a lot of money in the Amazon S3 bill.

Varnish is a state-of-the-art, high-performance HTTP accelerator. It uses the advanced features in Linux 2.6, FreeBSD 6/7 and Solaris 10 to achieve its high performance. I will not go over the installation of varnish here, but I would highly recommend to use the latest version available at this time 2.0.4 as older versions have various issues.

We could try to use something simple like this in a varnish vcl:

1
2
3
4
5
6
7
8
9
10
11
backend s3 {
  set backend.host = "my_bucket.s3.amazonaws.com";
  set backend.port = "80";
}
  
sub vcl_recv {
  if (req.url ~ "\.(css|gif|ico|jpg|jpeg|js|png|swf|txt)$") {
    set req.backend = s3;
    lookup;
  }
}

Apache2 Umask

| Comments

Many times you might want to fine tune the default permissions of the files created on a linux system. This is very simple and usually if you are using bash all you have to do is to define somewhere in the bash startup files (/etc/profile is a good place for this) a new value for umask like this:

1
umask 002

(this will allow by default group write permissions on the newly created files)

Normally on modern linux distributions this is by default set to 022 and you can easily find out what it is on your system by running the umask command:

1
umask

Contrary to what you might think, this is not enough to have this working for all applications and daemons on the system. This works fine for any files created from a shell session, but the files created by other processes, like the web server for example, will still use the default, unless otherwise configured. In order to have apache use a different umask we can define this inside /etc/apache2/envvars (debian, and ubuntu systems) or /etc/sysconfig/httpd (rhel,centos systems) like this:

1
umask 002

and restart apache to enable it.

Other daemons will have different locations where you can define this to overwrite the default setting for umask (check their documentation if you are unsure).

Debian Adopts Time-based Release Freezes

| Comments

Earlier this week, at DebConf 9, the Debian team proposed a new approach for the Debian’s release cycle, which was later on announced publicly on the Debian site:

“The Debian project has decided to adopt a new policy of time-based development freezes for future releases, on a two-year cycle. Freezes will from now on happen in the December of every odd year, which means that releases will from now on happen sometime in the first half of every even year. To that effect the next freeze will happen in December 2009, with a release expected in spring 2010. The project chose December as a suitable freeze date since spring releases proved successful for the releases of Debian GNU/Linux 4.0 (codenamed “Etch”) and Debian GNU/Linux 5.0 (“Lenny”).”

This doesn’t mean that we will have a time-based release as for example Ubuntu does on a specific date, but it means that we will have a time-based freeze for each new release (in December of every odd year); the release will still become stable “when it is ready”, but after this, we can expect the new releases in general sometimes in the spring of the every even year.

“Time-based freezes will allow the Debian Project to blend the predictability of time based releases with its well established policy of feature based releases. The new freeze policy will provide better predictability of releases for users of the Debian distribution, and also allow Debian developers to do better long-term planning. A two-year release cycle will give more time for disruptive changes, reducing inconveniences caused for users. Having predictable freezes should also reduce overall freeze time.”

This new approach will leave a very short time for the next release Debian 6.0 (“Squeeze”), that will be freezed later on this year (lenny was released earlier this year in February). Here are the major release goals for squeeze: multi-arch support, which will improve the installation of 32 bit packages on 64 bit machines, and an optimised boot process for better boot performance and reliability.

OSBridge: Configuration Management Panel

| Comments

The moment I heard about the Open Source Bridge Configuration Management panel session on FLOSS Weekly a while ago, I was hoping that I will be able to see the recording of this session (as for obvious reasons I was not able to attend and see this live in Portland, Oregon). They managed to bring together (for the first time to my knowledge) the creators (or maintainers) of all the major configuration management tools to date was very impressive; and obviously someone as myself that has been working with many of these tools (I haven’t tried/used automateit yet) would definitely see this as a great session.

Here are the members of the configuration management panel (from left to right):

Luckily the video of the session (among other videos from Open Source Bridge) was published and anyone can see this great event:

Now, after I sow this I must admit that I was hoping for a little more engagement and controversy. Instead we sow a friendly debate where everyone presented his own tool, without trying to go over the line and tell why it is better than the one of someone else (we have definitely seen several such blog posts from them in the past ;) ). Anyway this was a great event and a great opportunity to have all the major people in this field come together and share their story. I’m sure that after this they will get back to work, we will see new features and improvements in their tools.

FindMyHosting Review

| Comments

This post is sponsored by FindMyHosting - a free and very comprehensive web hosting directory featuring the most popular web hosting companies and thousands of customer reviews.

I’ve been asked to review this site and give my impressions about it. The truth is that I don’t have much experience with shared hosting as most of my experience is with dedicated servers from various hosting companies, and anytime I had a friend asking about where do I recommend him to host his small site I didn’t knew where to direct him. This is why I thought that such a webhosting directory as FindMyHosting would be a great start for anyone looking for a shared hosting account to host his new site. We can search from a long list of hosting company and get them ranked by users reports (nice).

Debian Lenny 5.0.2 Updated

| Comments

The Debian project just announced the second update for its stable distribution “lenny” 5.0.2. Those installing regular updates from security.debian.org might not even notice this update, except for the version change to 5.0.2. As an interesting change, the debian-installer has been updated to allow the installation of the oldstable release (Debian 4.0 “etch”).

“The Debian project is pleased to announce the second update of its stable distribution Debian GNU/Linux 5.0 (codename “lenny”). This update mainly adds corrections for security problems to the stable release, along with a few adjustment to serious problems. Please note that this update does not constitute a new version of Debian GNU/Linux 5.0 but only updates some of the packages included. There is no need to throw away 5.0 CDs or DVDs but only to update via an up-to- date Debian mirror after an installation, to cause any out of date packages to be updated. … New version of the debian-installer The debian-installer has been updated to allow the installation of the previous stable release (Debian 4.0 “etch”) and to include an updated cdebconf package which resolves several issues with installation menu rendering using the newt frontend, including: * explanatory text overlapping with the input box due to a height miscalculation * overlapping of the “Go Back” button and the select list on certain screens * suboptimal screen usage, particularly affecting debian-edu installations The installer has been rebuilt to use the updated kernel packages included in this point release, resolving issues with installation on s390 G5 systems and IBM summit-based i386 systems.”

Release Announcement: http://www.debian.org/News/2009/20090627

Linux Tips: Get the List of Subdirectories With Their Owner & Permissions and Full Paths

| Comments

I needed to get a list of all the subdirectories that were owner by some other user than root under /var and their permissions/owner with full paths. My first thought was to use ls and something like this:

1
2
3
4
5
6
ls -dlR */
drwxr-xr-x  2 root root  4096 2009-06-05 06:25 backups/
drwxr-xr-x  8 root root  4096 2009-05-11 06:02 cache/
drwxr-xr-x  2 root root  4096 2009-05-06 04:49 ec2/
drwxr-xr-x 25 root root  4096 2009-05-25 14:55 lib/
...

will show the subdirectories just as I needed but only at one level. Using // would show the next level, etc. This obviously is not a solution and unfortunately I had found no other way to do this with ls. Using:

1
2
3
4
5
6
7
8
ls -alR | grep ^d
drwxr-xr-x 15 root root  4096 2009-05-11 06:02 .
drwxr-xr-x 22 root root  4096 2009-06-03 15:02 ..
drwxr-xr-x  2 root root  4096 2009-06-05 06:25 backups
drwxr-xr-x  8 root root  4096 2009-05-11 06:02 cache
drwxr-xr-x  2 root root  4096 2009-05-06 04:49 ec2
drwxr-xr-x 25 root root  4096 2009-05-25 14:55 lib
....

works somehow, but since I don’t have the full paths this is useless.

PHP Sessions in Memcached

| Comments

The moment a PHP application grows to run on more servers, normally people will see problems caused by PHP sessions. If the application is not persistent you are lucky and don’t care about this, but if not you will quickly see this regardless of how good the load balancer you use is handling stickiness (sending the users to the same real server), this will slowly become a major issue. There are various solutions that can be used to store PHP sessions in a shared location, but I want to present today one solution that is very simple to implement, yet very efficient and on the long term better suited than using a database backend for this: using memcache to store the sessions.

The pecl memcache php extension has supported for a long time the memcache session.save_handler, but with the release 3.0.x (still in beta at this time) this brings in a set of interesting features for us: - UDP support - Binary protocol support - Non-blocking IO using select() - Key and session redundancy (values are written to N mirrors) - Improved error reporting and failover handling

HowTo Update DNS Hostnames Automatically for Your Amazon EC2 Instances

| Comments

A while ago one of the major problems people faced to use Amazon EC2 into production environments was the dynamic state of the instances IPs. Every time one instance was started it was getting a new, dynamic IP. This has been addressed with the introduction of Amazon Elastic IP Addresses, but even when using this, the private IPs are still dynamic and most of the time people will want to communicate between several instances on the private allocated IPs and not on the public ones. This article will show how you can easily automate the process to update DNS hostnames for your EC2 instances, by adding to the AMI’s the logic for this. I will use for this a master DNS server running bind9, but this can be adapted to any other DNS server.