Recently I’ve worked on a project where we deployed a bunch KVM instances. Immediately we noticed horrible IO performance on all the guests instances. In this particular case the hosts and the guests were all Ubuntu 10.04 Lucid and were created with vmbuilder without any special settings using the ubuntu defaults. Here is a sample command similar to what we used to build the kvm images:
Now even if we haven’t tuned anything I would have expected it to perform at least the same level or even better compared with a Xen instance. Still, this was not the case, and the performance was really horrible and any kind of IO bound tasks would effectively lock the instance. Looking into this and trying to understand what was the problem I was able to isolate this issue happening only on instances that had ext4 as the filesystem (the default for lucid), but strangely enough this didn’t happen for an older instance that was build with ext3 (actually a debian lenny instance). All the images build with the above command will use qcow2 sparse format as the default format for the disk.
In order to achieve good IO performance we had to use cache=’writeback’ for the instances and this will significantly increase the IO performance and bring it almost to host level performance, but in anycase much better compared with the old xen instances we had. Here is how you can enable writeback for an instance: stop the vm; edit the guestdomain and add cache=writeback in the driver section, save and start back the vm:
1 2 3 4
Here is the how the disk part of my guest domain looks like after adding the cache writeback:
1 2 3 4 5
In the process of debugging and searching for a fix for this issue, I’ve found out that it can also be useful to use elevator=noop as the default kernel io scheduler; this definitely helps, but not to the same extend as the cache writeback setting on the virtio disk. You can add elevator=noop to your kernel command line in your grub config, and I have this by default on all the instances.
Hopefully this will help you greatly improve IO performance for your KVM guests and will save you the time I’ve lost while trying to find a solution to this problem. Please feel free to share your experiences using the comment form bellow; also I’m curious if you have any other tips on how to improve this even more.