In this post, I’ll walk you through the process of upgrading an existing managed Amazon Elasticsearch cluster to Graviton2. We will use the same cluster to perform the upgrade, which means we will upgrade the cluster in place, without creating a new cluster.
This migration is much easier than the self-managed one; it requires only one step ;). We first need to figure out the graviton2 instance type we want to use for our migration. As like with regular ec2 instances AWS provides a wide range of instances for Elasticsearch. The main ones are the “m” instances which are general-purpose instances, and the “r” instances are memory-optimized. The number after the instance type (e.g. “2xlarge” or “16xlarge”) indicates the number of vCPUs and the amount of memory available on the instance:
The available Amazon OpenSearch-optimized Graviton2 instances are:
m6g.medium.elasticsearch
m6g.large.elasticsearch
m6g.xlarge.elasticsearch
m6g.2xlarge.elasticsearch
m6g.4xlarge.elasticsearch
m6g.8xlarge.elasticsearch
m6g.12xlarge.elasticsearch
m6g.16xlarge.elasticsearch
r6g.large.elasticsearch
r6g.xlarge.elasticsearch
r6g.2xlarge.elasticsearch
r6g.4xlarge.elasticsearch
r6g.8xlarge.elasticsearch
r6g.12xlarge.elasticsearch
r6g.16xlarge.elasticsearch
Note: we will also need to make sure we run a supported version of the managed AWS Elasticsearch/OpenSearch that supports graviton2 instances. For the older Elasticsearch anything newer than 7.8 should work, and if you are using the OpenSearch version then any version would work as this has been available since version 1.0.0. If you are running an older version you will need first to upgrade to a supported version before moving forward.
The actual migration only requires us to change the instance type. This can be done in the AWS console, using the AWS cli, or a tool like terraform. Since I use terraform to manage all the cloud assets I will show how this is done with terraform; this would look something like:
resource "aws_opensearch_domain" "elasticsearch_domain" {
domain_name = "search-domain"
elasticsearch_version = "7.10"
cluster_config {
instance_type = "m6g.large.elasticsearch" # this replaces the previous m5 type of instance we had
instance_count = 3
dedicated_master_enabled = true
dedicated_master_count = 3
}
ebs_options {
ebs_enabled = true
volume_type = "gp3"
volume_size = 1000
}
... # other elasticsearch cluster configs
}
I want to point out also that we are now able to use gp3 for the ebs volume which allows for much better performance and increased size allowed per data node. This is great optimization that can make the cluster much faster and reduce the need for extra data nodes (we were able to cut our nodes in half from this combination: graviton2 for better performance and gp3 for higher storage capacity per node)
Once you run terraform apply
with the new instance type this will kick in the automatic blue-green deployment from AWS managed Elasticsearch that will spin up a new set of nodes and migrate the data to the new nodes; once this is done the original nodes are automatically removed. Depending on the size of your data in the cluster this might take a long time and terraform might time out (60m by default). If this happens, you can use the AWS console or cli to monitor the status of the migration.
aws es describe-upgrade --domain-name <domain-name>
should show the status of the upgrade for the specified domain. You can also check the health of the cluster after the upgrade:
aws es describe-elasticsearch-domain --domain-name <domain-name> --query 'DomainStatus.ClusterStatus.Health'
This command will return the current health status of the Elasticsearch cluster. If the upgrade has been completed successfully, the cluster should have a green health status. If there are any issues with the upgrade, the cluster may have a yellow or red health status, indicating that there are problems that need to be addressed
Note: theoretically there should be no downtime during the process, but the performance might be slightly impacted during the blue-green migration.
As you can see there is a huge advantage while performing such a migration using a managed service compared with the self-managed solution where we had to handle and take care of everything ourselves.
Upgrading a managed Amazon Elasticsearch cluster to Graviton2 is a straightforward process that can provide significant benefits. By upgrading to Graviton2 instances, you can improve performance, reduce costs, and increase the efficiency of your infrastructure. AWS offers several Graviton2 instance types optimized for Amazon OpenSearch, each with its own set of advantages.
In this post, I have walked you through the process of upgrading an existing managed Amazon OpenSearch cluster to Graviton2 instances. We have used the same cluster to perform the upgrade, which means we have upgraded the cluster in place, without creating a new cluster. I have also provided examples and command-line steps to help you through the process.
Overall, upgrading your managed Amazon OpenSearch cluster to Graviton2 instances is a great way to take advantage of the latest technology and improve the performance and cost efficiency of your search application.
]]>The first Elasticsearch version that added support for ARM processors was Elasticsearch 7.8. This version introduced official support for ARM64 architecture and was released on May 26, 2020. Before this release, Elasticsearch was only officially supported on x86-based platforms. So in our case, this required us to migrate to a supported version first. We were running an older version in the stable branch 7.x and we upgraded to 7.17 using the standard Elasticsearch rolling upgrade docs.
Here are the steps needed for this migration:
The first step in the migration process is to create new Graviton2-based EC2 instances. You can do this using the AWS Management Console or the AWS CLI, or even better use terraform as I do. Various Linux distributions run on ARM, but I have chosen to use an Amazon Linux 2 AMI because this is very well supported by AWS. We can use the AWS console and use the filter for “Architecture” to be set to “arm64” for AMI and find the latest Amazon Linux 2 AMI. Or use a simple aws cli command like:
aws ssm get-parameters --names /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-arm64-gp2 --region us-east-1
This will return the Graviton2 AMI for the specific region we are using. We would use this in our terraform code to create the new Graviton2 instances; for ex:
# Elasticsearch nodes
resource "aws_instance" "es_nodes" {
count = 3
ami = "ami-XXX" # Replace with the AMI we found above
instance_type = "c6g.large"
security_groups = [aws_security_group.es_node_sg.name]
user_data = <<-EOF
#!/bin/bash
echo "cluster.name: es-cluster" >> /etc/elasticsearch/elasticsearch.yml
echo "node.name: ${format("es-node-%02d", count.index+1)}" >> /etc/elasticsearch/elasticsearch.yml
echo "network.host: [_ec2_:privateIpv4_, _local_]" >> /etc/elasticsearch/elasticsearch.yml
systemctl restart elasticsearch
EOF
tags = {
Name = "es-node-${count.index+1}"
}
}
Normally we would install Elasticsearch on the nodes using the user_data script, but during this migration, we went with a more manual method; you can install Elasticsearch using the RPM or DEB packages provided by Elasticsearch. Here is an example command to install Elasticsearch on an Amazon Linux 2 instance:
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
sudo tee /etc/yum.repos.d/elasticsearch.repo <<EOF
[elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
sudo yum install -y elasticsearch
After installing Elasticsearch on the new Graviton2-based EC2 instance, the next step is to configure Elasticsearch to use the existing data and settings from the old Elasticsearch cluster. You can do this by copying the Elasticsearch configuration files from the old cluster to the new instance.
This might look something like this:
rsync -avz --progress --delete /path/to/old/cluster/config/ ec2-user@new-instance-ip:/etc/elasticsearch/
Finally, once the Elasticsearch configuration files are copied to the new Graviton2-based EC2 instance, the next step is to start Elasticsearch on the new instance. You can do this using the Elasticsearch service command. Here is an example command to start Elasticsearch on the new instance:
sudo service elasticsearch start
The final step in the migration process is to verify that the data and settings from the old Elasticsearch cluster have been successfully migrated to the new Graviton2-based EC2 instance. You can do this by checking the Elasticsearch logs and running some search queries on the new instance.
Here is an example command to check the Elasticsearch logs on the new instance:
sudo tail -f /var/log/elasticsearch/elasticsearch.log
This command shows the Elasticsearch logs on the new instance, and you can use it to check if any errors or warnings are reported during the migration process.
After all the new Graviton2 instances are in sync in the cluster you can go ahead and remove the old intel instances one by one and allow the cluster to rebalance.
Migrating an Elasticsearch cluster to running on Graviton EC2 instances can provide significant performance and cost benefits. In this blog post, I walked you through the process of migrating an existing Elasticsearch cluster to new Graviton2-based EC2 instances. By following the steps outlined in this post, you can easily migrate your Elasticsearch cluster to Graviton2-based EC2 instances and take advantage of the cost/performance improvements they offer.
]]>mysql> set global innodb_max_dirty_pages_pct = 0;
You can check the number of dirty pages with the command:
mysqladmin ext -i10 | grep dirty
Let the server run like this for a while and after you see it settle in, the restart (or stop) should be much faster.
]]>When you are ready to upgrade, you will notice that unfortunately there is no official migration path. This howto will document what I’ve used myself for such migrations and hopefully will help you too if you are trying to perform a similar upgrade.
Opscode has done an amazing job with the omnibus installers and starting with Chef 11, the chef server has support for this also. Meaning you can install a new chef server simply by installing the rpm or deb for your platform and everything should be installed for you (ruby/gems, chef, rabbitmq, solr, erlang, postgresql, nginx). Just head over to http://www.opscode.com/chef/install/ and from the chef-server tab download the version for your OS.
In order to migrate to a new chef server we need to migrate from the old server:
It is important to have all the clients with their proper public keys because if not we would have to re-register each one of them.
Personally, I’ve migrated using this process several servers from open source chef 0.10.x to chef 11, but theoretically this should work from any chef server implementation (hosted, private, etc.) because we are downloading and uploading the assets using the api calls.
You can use my knife-backup plugin for this. Once you install the gem you can just run it and it will backup all the objects from the existing server:
gem install knife-backup
knife backup export
This might take a while depending on your number of nodes/clients, cookbooks, etc. you have. Once completed you will have in .chef/chef_server_backup
all the needed files.
Optional: if you have many unused cookbook versions you might want to clean them first before the backup. You can use my knife-cleanup plugin for this:
gem install knife-cleanup
knife cleanup versions -D
I would recommend to setup a new server as this would be the safest approach in case something doesn’t work out well and you don’t have to mess with your current environment. As mentioned earlier you can install the new server very easy with the omnibus installer. For example for Ubuntu 12.04 this would look like:
wget https://opscode-omnitruck-release.s3.amazonaws.com/ubuntu/12.04/x86_64/chef-server_11.0.6-1.ubuntu.12.04_amd64.deb
dpkg -i chef-server*
sudo chef-server-ctl reconfigure
You can also use the chef-server cookbook to install your new server if you prefer that.
Once you have the new chef server up and running, you will need to setup a new admin account and a new knife config. I would recommend to use a special user for this to not interfere with the users that we are trying to import from the old server. I would call it transfer
. From the local server this would look like:
mkdir -p ~/.chef
sudo cp /etc/chef-server/chef-webui.pem ~/.chef/
sudo cp /etc/chef-server/chef-validator.pem ~/.chef/
marius@chef:~# knife configure -i
WARNING: No knife configuration file found
Where should I put the config file? [/marius/.chef/knife.rb]
Please enter the chef server URL: [http://localhost:4000] https://localhost
Please enter a clientname for the new client: [transfer]
Please enter the existing admin clientname: [chef-webui]
Please enter the location of the existing admin client's private key: [/etc/chef/webui.pem] ~/.chef/chef-webui.pem
Please enter the validation clientname: [chef-validator]
Please enter the location of the validation key: [/etc/chef/validation.pem] ~/.chef/chef-validator.pem
Please enter the path to a chef repository (or leave blank):
Creating initial API user…
Created client[transfer]
Configuration file written to /marius/.chef/knife.rb
Note: the default server keys will be located in /etc/chef-server/
and not in /etc/chef
like they used to be, and this is definitely a welcome change. Also the default server url will still look for http and port 4000, but with chef 11 this works behind a nginx load balancer and listens by default on standard https port.
Finally, now we can restore all the data from the old server. You can transfer it from the backup and for simplicity drop it in your user .chef
folder under .chef/chef_server_backup
; be sure to install the knife-backup gem to the server and you should be able to run:
marius@chef:~# knife backup restore
WARNING: This will overwrite existing data!
WARNING: Backup is at least 1 day old
Do you want to restore backup, possibly overwriting exisitng data? (Y/N) y
Restoring clients
...
And this should restore all the data in the new server. Final step would be to regenerate the indexes:
chef-server-ctl reindex
Note: I want to point out that currently knife-backup will skip any clients that already exist on the server as I could not find a way to overwrite them using the api calls. This means that most certainly the validation key will need to be changed as that is a user that for sure will exist on the newly installed server.
After the data migration is completed you will probably just have to point your DNS alias to the new server. One issue I’ve noticed is that the chef server when installed will use the local dns record in various places in its config files. When working on a temporary server this has caused problems once changing the dns and activating the server. The chef server will send to the client links from where to download the assets (cookbook parts for ex) and if this was unconfigured at install time then you might have to fix it and correct it to the dns entry the clients can download correctly; check it out:
grep s3_url /var/opt/chef-server/erchef/etc/app.config
and restart the chef server after correcting the s3_url:
chef-server-ctl restart
Hopefully this post will help you migrate to Chef 11. Feel free to let me know in the comments bellow if you had any issues following this process, or if it worked without any problems. Also if you find any problems with the tools used here knife-cleanup or knife-backup please open a ticket on github or submit a patch. Good luck!
]]>knife-backup will backup all cookbook versions available on the chef server. Cookbooks are normally available in a repository and should be easy to upload like that, but if you are using various cookbook versions in each environment then it might not be so trivial to find and upload them back to the server; downloading them and having them available to upload like that is simple and clean. If you have too many cookbook versions then you might want to cleanup them first using something like knife-cleanup.
If you want to check it out, just install the gem:
gem install knife-backup
and then just point it to an existing chef server to backup all its objects with:
knife backup export
If you need to restore then it is simple as:
knife backup restore [-d DIR]
Hope you will find this useful and looking forward for your feedback.
Patches are welcome: knife-backup on github
hadoop 0.1.118 0.1.116 0.1.115 0.1.114 0.1.113 0.1.111 0.1.109 0.1.108 0.1.106 0.1.105 0.1.104 0.1.103 0.1.102 0.1.101 0.1.99 0.1.98 0.1.97 0.1.96 0.1.95 0.1.94 0.1.93 0.1.92 0.1.91 0.1.90 0.1.89 0.1.88 0.1.87 0.1.86 0.1.85 0.1.84 0.1.83 0.1.82 0.1.81 0.1.80 0.1.79 0.1.78 0.1.77 0.1.76 0.1.75 0.1.74 0.1.73 0.1.72 0.1.71 0.1.70 0.1.69 0.1.68 0.1.67 0.1.66 0.1.65 0.1.64 0.1.63 0.1.62 0.1.61 0.1.60 0.1.59 0.1.58 0.1.57 0.1.56 0.1.55 0.1.54 0.1.53 0.1.52 0.1.51 0.1.50 0.1.49 0.1.48 0.1.47 0.1.46 0.1.45 0.1.44 0.1.43 0.1.42 0.1.41 0.1.40 0.1.39 0.1.38 0.1.37 0.1.36 0.1.35 0.1.34 0.1.33 0.1.32 0.1.31 0.1.30 0.1.29 0.1.28 0.1.25 0.1.24 0.1.23 0.1.22 0.1.21 0.1.20 0.1.19 0.1.18 0.1.17 0.1.16 0.1.15 0.1.13 0.1.12 0.1.11 0.1.10 0.1.9 0.1.8 0.1.7 0.1.6 0.1.5 0.1.4 0.1.3 0.1.2 0.1.0
(and this was the cookbook with the least versions that I’ve found to paste here).
While working on knife-backup I realized what a huge waste this was, and decided that I needed a way to clean these and keep on the server just the relevant ones.
To solve this problem I wrote knife-cleanup and if you have similar needs you might find it useful. It will cleanup all unused versions of the cookbooks you have on your chef server (this might be hosted opscode platform or open source server). Before doing any deletion it will backup the version it touches (just in case).
If you want to check it out, just install the gem:
gem install knife-cleanup
and assuming you have a working knife config you can run it with:
knife cleanup versions
and this will output the versions it would delete.
If you are ready to delete, you can do that with:
knife cleanup versions -D
and you can find the backups of the versions deleted under .cleanup/cookbook_name
Notes: I’ve seen various cases where it is impossible to download a cookbook version (and knife will error out). From my experience there is not much we can do about that, so the script will just ignore the backup, but will delete the corrupt version. You might want to have a full chef server backup before (see knife-backup) just in case. The way how I’m using this is with exact version pining of cookbooks in environments (for more details see chef-jenkins); if you are using environments and cookbook versions in a different way, then this might not make sense for you.
Hope you will find this useful and looking forward for your feedback.
Patches are welcome: knife-cleanup on github
One of the first things we’ve done last year was to introduce the Chef Cafes. These are small events (we have a max limit of 10 people set for them) done consistently at the same time (1st and 3rd Thursday of the month) at the best coffee in Mountain View (Red Rock Coffee) with the intent to facilitate the interaction between people, give them a place where they can regularly meet and discuss about chef, ask questions and also try to help other members in the spirit of the open source community. The first Chef Cafe was on March 1st 2012 and it was just me and Rob (we had a good time preparing the future events and just catching up). But after that, we had 16 Chef Cafe’s all year long and many of them had 10 or even more people, and each one of them was unique and special in its own way. We had some, where we had new chef users that had various questions on how to use chef and we tried to help them and resolve their blocks in understanding and getting up to speed with chef. On the other hand we had other cafes where we had really advanced uses that brainstormed about various unresolved problems and what was their take on things like cookbook testing, workflow or orchestration. Overall, I think it was a great success and allowed us to be more connected with members, and also more open and helpful to new chef users.
In 2013 we look forward to your suggestions on how we can improve the Chef Cafes and we will try to keep these going. We hope to be able to move one in San Francisco and keep the other one in the South Bay as we had various requests for that. So if you are in the City and you want to get involved with this please ping me.
One other thing we have tried to do was to bring consistency and have at least one meetup every month with an awesome presentation on some hot topic in the chef community. This ended up being a little too optimistic :(. Still, we had 6 cool meetups with speakers like:
and we also had Aaron Peterson running an introductory Chef Workshop; considering the big and diverse audience I think we have done quite a great job with that.
With the experiences we had last year, we are more confident that this year we will be able to run one meetup every month, but we need your help: we are always looking for great speakers and interesting topics; if you want to present at one of our meetups please let us know; also if you know someone that we should invite to present to a meetup please let us know.
Most of our meetups last year were hosted by Survey Monkey in Palo Alto and we can’t thank them enough for their support (special thanks to Tim Sabat for making them possible). We also had one meetup in San Francisco hosted at Scalr offices (thanks Sebastian). This year, we hope to diversify and run each meetup in a different place to make things more interesting; and hopefully more meetups in the City. If you are interested in hosting and sponsoring one of our future meetups please contact me privately and let me know.
During last year, our group has grown a lot. We started with 132 members in the first day of January 2012 and ended up the year with more than 400 members. This shows that the interest in Chef is obviously growing and hopefully the events we have been organizing are helping grow our local chef community.
If you have any suggestions on what you would like us to do in the future, please let us know. Use the comments bellow, send us a message, whatever works for you; we would love to hear from you and see how we can serve you better. Overall 2012 was great and with your help we can make 2013 even better!
]]>Believe it or not, I had 364 blog posts when I started the migration. Meaning a lot of energy was spent in importing those old articles. I’ve used exitwp to convert the wordpress-xml export of the blog posts; and this produced a reasonably good result. Still I had to run some fixes…
for code blocks:
perl -pi -e 's/([^\`]|^)(\`)([^\`]|$)/$1\n\`\`\`\n$3/g' *
to enable comments (as ‘comments: true’ was missing from all posts)
find source/_posts/ -type f -print0 | xargs -0 -I file sed -i '' '2 i \
comments: true' file
Enabled the octopress category list plugin and tags plugin, that you can see in the sidebar. Since I had already tags and categories on all posts it was very important to keep the same urls and not break them. Same thing for regular posts urls. Here are the relevant settings form the octopress config file:
root: /
permalink: /:year/:month/:day/:title/
category_dir: category
tag_dir: "tag"
Just keep in mind that if you have many tags as I do, the generation of the pages will increase a lot after you enable the tags plugin. You’ve been warned!
Not working at all… I’ve wrote a post specifically about this; check it out here
My wordpress blog has been around for a while (6years more or less) and even if I’ve always used feedburner for my feed, but for some strange reason I’ve always used my own feed url. This of course was no longer working with octopress, hence I had to setup a rewrite rule to not break everyone’s feed reader:
RewriteEngine On
Options +FollowSymLinks -Multiviews
# Feed url
RewriteRule ^feed/?$ atom.xml [QSA,L]
This was done automatically by wordpress, but octopress will serve just fine the non-www domain. This can cause issues with search engines and such, so I wanted the same behaviour. Apache again to the rescue:
RewriteCond %{HTTP_HOST} !^www [NC]
RewriteRule $ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R]
After you generated your octopress site, everything is static and fast by default. Still, you want to make sure that apache has some basic caching and compression settings to make it even better. Here are the relevant parts from my config:
#### CACHING ####
<IfModule mod_expires.c>
ExpiresActive On
# 1 MONTH
<FilesMatch "\.(ico|gif|jpe?g|png|flv|pdf|swf|mov|mp3|wmv|ppt)$">
ExpiresDefault A2419200
Header append Cache-Control "public"
</FilesMatch>
# 3 DAYS
<FilesMatch "\.(xml|txt|html|htm|js|css)$">
ExpiresDefault A259200
Header append Cache-Control "private, must-revalidate"
</FilesMatch>
# NEVER CACHE
<FilesMatch "\.(php|cgi|pl)$">
ExpiresDefault A0
Header set Cache-Control "no-store, no-cache, must-revalidate, max-age=0"
Header set Pragma "no-cache"
</FilesMatch>
</IfModule>
### Compression ####
<IfModule mod_deflate.c>
<IfModule mod_setenvif.c>
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
</IfModule>
<IfModule mod_headers.c>
Header append Vary User-Agent env=!dont-vary
</IfModule>
<IfModule mod_filter.c>
AddOutputFilterByType DEFLATE text/css application/x-javascript text/x-component text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon
</IfModule>
</IfModule>
If you have many posts, the generation of the octopress site will be extremely slow (in my case it takes about 2mins for a full generate) and this makes it basically impossible to work with any new post and see the feedback locally with preview. The solution is well documented and it works by isolating your single post while working on it, and when you are done you integrate back all the other posts before publishing them:
rake new_post['Finally Migrated to Octopress']
rake isolate[finally-migrated-to-octopress]
and now rake generate
and rake preview
will only work with the new post. Finally when done and ready to publish the awesome new post on the internets:
rake integrate
rake generate
rake deploy
Trying to understand and debug this issue, I looked in source/_includes/disqus.html
and found the code that is generating the javascript variable disqus_identifier for the posts:
and looking in the html generated by some blog posts the variables disqus_url and disqus_identifier looked ok, like this:
var disqus_identifier = 'http://www.ducea.com//2012/11/12/disqus-comments-not-visible-in-octopress/';
var disqus_url = 'http://www.ducea.com//2012/11/12/disqus-comments-not-visible-in-octopress/';
var disqus_script = 'embed.js';
Still at a closer look I was able to identify the issue; if you look closer at the url above, it has a double / in the url, and even if that should not cause any issues and identify the same url, Disqus was actually seeing it as a separate identifier and hence not showing the comments associated with it. Once I figured it out it was very simple to see where it came form (the site url from _config.xml) was:
url: http://www.ducea.com/
and fixing it, by removing the trailing slash:
url: http://www.ducea.com
Regenerating and deploying the site:
rake generate
rake deploy
fixed the issue and the comments are now back on the site. (you can even try it out here on this post ;)
Hopefully this will help others that are in the same situation… if you just added an extra slash to the Octopress site url config and didn’t realize this brake the Disqus comments.
]]>Even if I have not attended any workshop (they had 2 flavors, one targeted towards a sysadmin workflow and one for developers) the general feeling from people I talked with and attended them was that it was a very good experience, with a lot of hands-on practical examples. Tuesday afternoon, myself I attended the “ChefConf Pre-event Hackday: TEST ALL THE THINGS!!!” organized by Bryan Berry and it was great, and showed how many people are interested in testing their infrastructure as code; it was focused on cookbook testing (unit testing and integration testing), continuous integration with jenkins, and other things like that ;)
The first full day of ChefConf was Wednesday. The conference was structured with main presentations during the mornings and breakout sessions in the afternoon (with 2 main tracks and also a vendor one). From the beginning you could tell that this will be a very well run conference, and even if this was the first one, people like Jesse Robbins have a lot of experience running such events. Not surprisingly ChefConf kicked off with Adam Jacob’s “State of the Union Part 1: Chef, Past and Present” (video) ; Jesse Robbins talked about the community around chef and how this is a key part of Opscode strategy and their efforts to take this to the next level. He showed this very nice visualization of the commits to the chef github repo.
There were many interesting talks during the day, and they recorded most of them and hopefully will make them available online soon so you can see them if you didn’t had the chance to be here (or you want to review them again). I particularly enjoyed:
Ron Vidal - Operations Secret Sauce: Incident Management (video); similar to Jesse Robbins GameDay talk and it was a very nice addition, inspirational and full of interesting points.
Jim Hopp’s - Test-driven Development for Chef Practitioners (video); very well prepared and presented. I hope to have Jim to our Chef Bay Area meetup group to present something similar on the subject and run a testing hackaton.
Patrick McDonnell’s - Lessons from Etsy: Avoiding Kitchen Nightmares; people seem to love everything Etsy is doing and they are sharing a lot of their workflow with chef and open sourcing various tools they write.
and many others…
In the evening we had a great Ignite event ran by Andrew Shafer in his unconfundable way. We had 10 ignite speakers and in the middle there was a fun karaoke ignite that had 10 volunteers rambled on some slides they never sow before. If they recorded this, and will show it online look up the ones by Stephen Nelson-Smith and John Vincent as they were very entertaining.
The second day of the conference started with Christopher Brown’s “State of the Union Part 2: Chef, the Future” where he outlined some of the future features and main focuses of Opscode for Chef: becoming easier to install and use (omnibus installer), enterprise ready, focus on Windows and also a lot of focus on quality. Opscode is working on a project called kitchen chef that will allow to test the functionality of cookbooks on various environments and platforms, and quickly ensure the quality of the cookbook is maintained during various iterations. Also a lot of work has been put into reporting and handlers. The server side also has been completely rewritten in erlang and sql (from ruby and couchdb) and we should see this soon in the open-source and the private chef server. From the work done you can easily tell that a lot of work has been done on private chef and this is quickly becoming an important asset for Opscode going forward.
There were many great talks during the day from speakers like Artur Bergman, Ben Rockwood, Jason Stowe, John Esser, Rob Hirschfeld, Theo Schlossnagle, etc. I finished my day just like I started Tuesday with another event focused on testing: “Test Driven Development Roundtable”, ran by Stephen Nelson-Smith on a panel with Seth Chisamore, Jim Hopp and my friend Rob Berger. They went over the tools people are using these days and what are the things that are still missing and need to be worked on regarding testing.
Overall, I think this was an awesome event and I hope to be able to attend the next one also (hopefully at the same place). My impression is that Opscode is ready to move forward and make the next step and grow the community even bigger: “The revolution will not be televised - it will be coded with chef”.
]]>First we need to identify the file that is causing this issue; and for this we will verify all the packed objects and look for the biggest ones:
git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -5
(and grab the revisions with the biggest files). Then find the name of the files in those revisions:
git rev-list --objects --all | grep <revision_id>
Next, remove the file from all revisions:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <filename>'
rm -rf .git/refs/original/
Edit .git/packed-refs and remove/comment any external pack-refs. Without this the cleanup might not work. I my case I had refs/remotes/origin/master and some others branches.
vim .git/packed-refs
Finally repack and cleanup and remove those objects:
git reflog expire --all --expire-unreachable=0
git repack -A -d
git prune
Hopefully these steps will help you completely remove those un-wanted files from your git history. Let me know if you have any problems after following these simple steps.
Note: if you want to test these steps here is how to quickly create a test repo:
# Make a small repo
mkdir test
cd test
git init
echo hi > there
git add there
git commit -m 'Small repo'
# Add a random 10M binary file
dd if=/dev/urandom of=testme.txt count=10 bs=1M
git add testme.txt
git commit -m 'Add big binary file'
# Remove the 10M binary file
git rm testme.txt
git commit -m 'Remove big binary file'
If you are going to LISA11 in Boston next week, we should definitely meetup. Contact me on twitter or email.
The Limoncelli Test, was a very interesting presentation by Tom Limoncelli based on a blog post he wrote earlier this year. If you haven’t done it already I would strongly recommend to take the test and see how does your sysadmin team rank on “The Limoncelli Test”.
Recovering From Linux Hard Drive Disasters is Theodore Ts’o signature training material on what to do if you have any sort of hard drive failure and covers in depth details on how to recover from such disasters caused by software or hardware failures.
GameDay: Creating Resiliency Through Destruction (slides): I enjoyed very much Jesse Robbins presentation, where he draws parallels between two of his greatest passions: firefighting and operations. Watch the video.
SRE@Google: Thousands of DevOps Since 2004: Tom Limoncelli, describes the technologies and policies that Google uses to do what is (now) called DevOps. Watch the video.
]]>Also my colleagues from the LISA11 blogging team (Ben, Rikki and Matt) have done some very interesting interviews with some key people from LISA11 to get you prepared for the event. Check out the USENIX blog for more from us in the next week.
Here is also a quick intro of our team: “LISA11 Next Week – Meet your blog team!”
]]>Luckily, Jordan Sissel has built a tool called FPM (Effing Package Management), exactly for this: to ease the pain of building new packages; packages that you will use for your own infrastructure and you want them customized based on your own needs; and you don’t care about upstream rules and standards and other limitations when building such packages. This can be very useful for people deploying their own applications as rpms (or debs) and can simplify a lot of the process of building those packages.
FPM can be easily installed on your build system using rubygems:
gem install fpm
Once installed you can use fpm to build packages (targets):
from any of the following sources:
Use the command line help (fpm --help
) or the wiki to see full details on how to use it. I’ll show some simple examples on how to build some packages from various input sources that I’ve found useful myself.
This is how you would usually package an application that you would install with:
./configure; make; make install
For example, here is how you can create an rpm of the latest version of memcached:
wget http://memcached.googlecode.com/files/memcached-1.4.7.tar.gz
tar -zxvf memcached-1.4.7.tar.gz
cd memcached-1.4.7
./configure --prefix=/usr
make
so far everything looks like a normal manual installation (that would be followed by make install). Still we will now install it in a separate folder so we can capture the output:
mkdir /tmp/installdir
make install DESTDIR=/tmp/installdir
and finally using fpm to create the rpm package:
fpm -s dir -t rpm -n memcached -v 1.4.7 -C /tmp/installdir
where -s is the input source type (directory), -t is the type of package (rpm), -n in the name of the package and -v is the version; -C is the directory where fpm will look for the files. Note: you might need to install various libraries to build your package; for ex. in this case I had to install libevent-dev.
If you are packaging your own application you can do this just by pointing to your build folder and set the version of the app. Here is an example for an deb package:
fpm -s dir -t deb -n myapp -v 0.0.1 -C /build/myapp/0.0.1/
There are various other parameters that you can use but basically this is how simple it is to build a package from a directory. Here is an example on how to define some dependencies on the package you are building (using -d; repeat it as many times as needed):
fpm -s dir -t deb -n memcached -v 1.4.7 -C /tmp/installdir \
-d "libstdc++6 (>= 4.4.5)" \
-d "libevent-1.4-2 (>= 1.4.13)"
You can create a deb or rpm from a gem very simple with fpm:
fpm -s gem -t deb <gem_name>
this will download the gem and create a package named rubygem-<gem_name> For example:
fpm -s gem -t deb fpm
will create a debian package for fpm: rubygem-fpm_0.3.7_all.deb
You can inspect it with dpkg –info and you can notice that in this case it will fill nicely all the fields with the maintainer, and dependencies on various other gems. Very cool.
If you use python and want to package various python eggs this will work exactly the same and you will use -s python (it will download the python packages with easy_install first).
Overall FPM is a great tool and can help you simplify the way you are building your own packages. Check it out and let me know what you think and if you found it useful. And if you found this useful don’t forget to thank Jordan for his great work on this awesome tool.
]]>So if you have an idea for a chef cookbook, now it’s the time to start working on it. I’m offering my help for free for all my blog readers: I will help you write a cookbook by implementing your ideas; help reviewing it or suggest improvements, or whatever else you might need help with. Use the contact form to email me (or DM me on twitter) and let me know how I can help.
If you don’t have time to write a new cookbook but you have a great idea for a cookbook that is missing from the opscode community site, please post it bellow in the comments section and I’m sure some of my blog readers will help create it.
Again this is a brilliant idea from Opscode and it creates a win-win situation for everyone. I’m just curious, is this the first idea from their new community manager? If this is the case, great job Jesse ;).
]]>So let’s see how we can use veewee. I’m assuming you already have vagrant installed (and virtualbox), but if you don’t please install them first. To install veewee we just have to install the veewee gem:
gem install veewee
once you installed veewee you can see a new task added to vagrant: basebox.
Here is the list of the templates we get out of the box once we install veewee:
**vagrant basebox templates**
The following templates are available:
vagrant basebox define '' 'archlinux-i686'
vagrant basebox define '' 'CentOS-4.8-i386'
vagrant basebox define '' 'CentOS-5.6-i386'
vagrant basebox define '' 'CentOS-5.6-i386-netboot'
vagrant basebox define '' 'Debian-6.0.1a-amd64-netboot'
vagrant basebox define '' 'Debian-6.0.1a-i386-netboot'
vagrant basebox define '' 'Fedora-14-amd64'
vagrant basebox define '' 'Fedora-14-amd64-netboot'
vagrant basebox define '' 'Fedora-14-i386'
vagrant basebox define '' 'Fedora-14-i386-netboot'
vagrant basebox define '' 'freebsd-8.2-experimental'
vagrant basebox define '' 'freebsd-8.2-pcbsd-i386'
vagrant basebox define '' 'freebsd-8.2-pcbsd-i386-netboot'
vagrant basebox define '' 'gentoo-latest-i386-experimental'
vagrant basebox define '' 'opensuse-11.4-i386-experimental'
vagrant basebox define '' 'solaris-11-express-i386'
vagrant basebox define '' 'Sysrescuecd-2.0.0-experimental'
vagrant basebox define '' 'ubuntu-10.04.2-amd64-netboot'
vagrant basebox define '' 'ubuntu-10.04.2-server-amd64'
vagrant basebox define '' 'ubuntu-10.04.2-server-i386'
vagrant basebox define '' 'ubuntu-10.04.2-server-i386-netboot'
vagrant basebox define '' 'ubuntu-10.10-server-amd64'
vagrant basebox define '' 'ubuntu-10.10-server-amd64-netboot'
vagrant basebox define '' 'ubuntu-10.10-server-i386'
vagrant basebox define '' 'ubuntu-10.10-server-i386-netboot'
vagrant basebox define '' 'ubuntu-11.04-server-amd64'
vagrant basebox define '' 'ubuntu-11.04-server-i386'
vagrant basebox define '' 'windows-2008R2-amd64-experimental'
This means that we can build a box based on any of the above templates. That’s awesome! Let’s say we want to build a debian squeeze box using veewee; we would have to run:
vagrant basebox define 'debian-60' 'Debian-6.0.1a-amd64-netboot'
and this will create a folder definitions/debian-60 with the following files (the content of the veewee template):
definition.rb
postinstall.sh
preseed.cfg
we can modify/tune any of those files based on our custom needs. The file definition.rb is the main definition of the template. Here you would define the memory size, disk size, iso file, etc. The content is very easy to understand, but you would normally not have to change many things here. preseed.cfg is just a standard preseed file where you would customize the actual install process (you could change here the partitions or their type, timezone setup, etc). And finally postinstall.sh that is a bash script that will run at the end of the installation process and it will install ruby, gems , chef and puppet and also the virtualbox guest additions (needed for shared folders).
If you have the iso already place it in ‘currentdir’/iso. If not, veewee will download it and place it in the appropriate folder before starting the install process:
vagrant basebox build 'debian-60'
this will start the installation and you can see all the steps it takes (the keystrokes as they are entered, etc.). This can take a while… Once it is done you can validate the build with:
vagrant basebox validate 'debian-60'
(this will run a few basic tests to see if it can connect to the vm as user vagrant, if chef and puppet were installed, if the shared folders are accessible, etc).
And finally you can export it as a vagrant box with:
vagrant basebox export 'debian-60'
and add it to vagrant:
vagrant box add 'debian-60' debian-60.box
and now you can use it in vagrant with:
vagrant init 'debian-60'
That’s it. Very simple and now we have our own box built from scratch. As a side note, I found this very useful to test and troubleshoot preseed configurations ;). As you can see there are plenty of templates available in veewee but if you create a new one please consider to share it with others and send it to Patrick on github. I’m sure he will be happy to include it in newer versions of veewee. And if you found this useful don’t forget to thank Patrick for his great work on this awesome tool.
]]>Monitoring with Icinga @ SF Bay Area LSPE meetup
View more presentations from mdxp
@LSPEMeetup made available the video on justin.tv; unfortunately the quality of the video/sound is not the best; you can find it here.
]]>vmbuilder kvm ubuntu --suite=lucid --flavour=virtual --arch=amd64 --mirror=http://en.archive.ubuntu.com/ubuntu -o --libvirt=qemu:///system --ip=10.0.0.11 --gw=10.0.0.1 --part=vmbuilder.partition --templates=mytemplates --user=username --pass=password --firstboot=/var/vms/vm1/boot.sh --mem=1024 --hostname=myhost --bridge=br0
Now even if we haven’t tuned anything I would have expected it to perform at least the same level or even better compared with a Xen instance. Still, this was not the case, and the performance was really horrible and any kind of IO bound tasks would effectively lock the instance. Looking into this and trying to understand what was the problem I was able to isolate this issue happening only on instances that had ext4 as the filesystem (the default for lucid), but strangely enough this didn’t happen for an older instance that was build with ext3 (actually a debian lenny instance). All the images build with the above command will use qcow2 sparse format as the default format for the disk.
In order to achieve good IO performance we had to use cache=‘writeback’ for the instances and this will significantly increase the IO performance and bring it almost to host level performance, but in anycase much better compared with the old xen instances we had. Here is how you can enable writeback for an instance: stop the vm; edit the guestdomain and add cache=writeback in the driver section, save and start back the vm:
virsh --connect qemu:///system
stop guestdomain
edit guestdomain <-- add cache='writeback' in the driver section
start guestdomain
Here is the how the disk part of my guest domain looks like after adding the cache writeback:
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='writeback'/>
<source file='/var/vms/vm2/ubuntu-kvm/tmphAUcOB.qcow2'/>
<target dev='hda' bus='ide'/>
</disk>
In the process of debugging and searching for a fix for this issue, I’ve found out that it can also be useful to use elevator=noop as the default kernel io scheduler; this definitely helps, but not to the same extend as the cache writeback setting on the virtio disk. You can add elevator=noop to your kernel command line in your grub config, and I have this by default on all the instances.
Hopefully this will help you greatly improve IO performance for your KVM guests and will save you the time I’ve lost while trying to find a solution to this problem. Please feel free to share your experiences using the comment form bellow; also I’m curious if you have any other tips on how to improve this even more.
]]>Here is a handy command that will stop all the possible chef server related services:
for svc in server server-webui solr expander
do
sudo /etc/init.d/chef-${svc} stop
done
Simply run:
sudo gem update chef chef-server --no-ri --no-rdoc
and this should upgrade all the other gems it needs to. A sample output will look like this:
gem update chef chef-server --no-ri --no-rdoc
Updating installed gems
Updating chef
Successfully installed chef-0.10.2
Updating chef-expander
Successfully installed chef-expander-0.10.2
Updating chef-server
Successfully installed chef-server-api-0.10.2
Successfully installed chef-server-webui-0.10.2
Successfully installed chef-solr-0.10.2
Successfully installed chef-server-0.10.2
Gems updated: chef, chef-expander, chef-server-api, chef-server-webui, chef-solr, chef-server
Optional: if you want you can cleanup the system from old, unused gems with:
sudo gem cleanup
Again in a single command, now to start them:
for svc in server server-webui solr expander
do
sudo /etc/init.d/chef-${svc} start
done
That’s it, now you should be running the latest and greatest chef server version 0.10.2.
]]>first of all it is a free event (compared with a regular O’Reilly conference where prices usually start at $1k).
it is much more interactive: while Velocity is a classic conference where you normally have a presenter showing off something (hopefully not selling or hiring), and maybe some questions at the end, DevOpsDays is more like an open discussion, with people either on a panel or open spaces.
the food was way better at DevOpsDays, no question about it. And the ice cream on Saturday added an extra special touch ;).
The first day, Friday, started with the “Devops State of the Union” by John Willis. This was a very good introduction on what DevOps means and a look back on what happened during the past couple of years, especially considering the fact that many people where there for the first time. For example I met someone from Microsoft that was sent here to find out “what is this devops thing” and how they can use it, and this just shows what a huge progress the devops movement has made in such a short amount of time, and how many people are now interested in the movement. (in this particular case I’m not sure he returned at Microsoft with something useful, but just the fact that they are interested in this demonstrates my point).
Next, we had some very interesting panels (4-5 people in general) like: “To Package or not to Package”, “Orchestration at Scale”, “DevOps Metrics and Measurement”, “DevOps..Where’s the QA?” and finally “Escaping the DevOps Echo Chamber”. Even though I believe some moderators could have done a better job (not leaving for 10-15mins people standing/waiting to ask a question) I believe this is a great format, very informal and interactive promoting an open discussion and people sharing their experiences. We had also some great ignite presentations and by far the most interesting and unexpected one was David Lutz with his DevOps song.
The second day was in the format of an unconference, with several open spaces and some short presentations around lunch time. Many people left as they probably wanted to spend the weekend home with their families, but many stayed for Saturday also (about half). From the sessions I attended, I really enjoyed a lot the one about Kanban; very useful to see how others used it in operations teams and what problems they had implementing it. I enjoyed also Patrick’s presentation about vagrant (I’m playing already with veewee). Also was very proud of Nate and Rich releasing their product Reactor8 with this occasion.
Overall I think this was an awesome event. Very much improved compared with last year: two days compared with only one, and I liked a lot the format (day 1 panels & day 2 open spaces). Personally I will probably skip Velocity next year (unless I have a talk accepted) and stick only with DevOpsDays. If you are in the area there is no reason to miss DevOpsDays and I would highly recommend it, or any of the other DevOpsDays events close to your area.
ps: They recorded all the event (very professional, with multiple angles, etc.) and the content will probably come up online very soon, and once that happens I will link it here also.
]]>