Monitoring with Icinga @ SF Bay Area LSPE meetup

Yesterday evening I presented at the SF Bay Area Large-Scale Production Engineering meetup group at Yahoo HQ a talk about “Monitoring with Icinga”. This was an introductory talk intended to bring awareness about icinga (there were only 3-4 people from the audience of about 75 that heard of it before), and I think it reached its goal very well; afterwards there were many people interested to try it out and had various questions about it at the end. I was also very happy to have Matthew Brooks one of the icinga core developers in the audience and backing me up to some of the more difficult questions people had. Thanks again Matthew for coming! Here are the slides from my presentation:

@LSPEMeetup made available the video on justin.tv; unfortunately the quality of the video/sound is not the best; you can find it here.

Tags: , , , ,

HowTo Improve IO Performance for KVM Guests

Recently I’ve worked on a project where we deployed a bunch KVM instances. Immediately we noticed horrible IO performance on all the guests instances. In this particular case the hosts and the guests were all Ubuntu 10.04 Lucid and were created with vmbuilder without any special settings using the ubuntu defaults. Here is a sample command similar to what we used to build the kvm images:

vmbuilder kvm ubuntu --suite=lucid --flavour=virtual --arch=amd64 --mirror=http://en.archive.ubuntu.com/ubuntu -o --libvirt=qemu:///system --ip=10.0.0.11 --gw=10.0.0.1 --part=vmbuilder.partition --templates=mytemplates --user=username --pass=password --firstboot=/var/vms/vm1/boot.sh --mem=1024 --hostname=myhost --bridge=br0

Now even if we haven’t tuned anything I would have expected it to perform at least the same level or even better compared with a Xen instance. Still, this was not the case, and the performance was really horrible and any kind of IO bound tasks would effectively lock the instance. Looking into this and trying to understand what was the problem I was able to isolate this issue happening only on instances that had ext4 as the filesystem (the default for lucid), but strangely enough this didn’t happen for an older instance that was build with ext3 (actually a debian lenny instance). All the images build with the above command will use qcow2 sparse format as the default format for the disk.

Read the rest of this entry »

Tags: , ,

HowTo upgrade Chef from 0.10 to 0.10.2 – rubygems install

A few days ago Opscode released a security fix for chef server 0.10.0 and 0.9.16 and this post will show how upgrade to chef-server 0.10.2. First start by backing up your data. Seriously. In the past I’ve had serious problems when performing similar upgrades (even a minor one like this that looks harmless), and even if now opscode are much better with this process it never hurts to be precautions. Since I use a rubygem install the next steps will focus on this type of installation; if you are using distribution or opscode packages this will not be very helpful as probably packages are not yet available for this upgrade; once they will replace the gem upgrade part with the deb/rpm upgrade and you should be set.

1. Stop all the chef related services

Here is a handy command that will stop all the possible chef server related services:
for svc in server server-webui solr expander
do
sudo /etc/init.d/chef-${svc} stop
done

Read the rest of this entry »

Tags: ,

Marius on Twitter