Moving your website to another server? Tune your DNS for minimum downtime.

I hate to move a website to another server, but in real life there are many situations were this will happen (maybe you need to upgrade your current server to a better one, maybe your current server is on a bad datacenter, or you just found a better financial deal, or maybe you only need to change the IPs on the same server, etc.). Besides some other problems that might appear (for example software incompatibilities, and the site not running the same way on the new server), the major problem you might face is the DNS caching problem. This means that based on what your current DNS configurations, for a period of time your site will be accessible on both servers. Now this might be ok, but in most situations it is not. For example you will have the emails delivered by some remote servers to either server, or some users browsing your site still on the old server (causing problems with e-commerce sites, or sites that depend on the various data that is saved while users browse the site). I have completed successfully with minimum downtime many such moves and I will show you how this can be done with a very simple DNS trick.

In order to do this in the best possible way we need to be aware and understand a little about how DNS caching works. When a remote DNS resolver will make a query to one authoritive DNS server (let’s say for the domain we are moving) then if the query is successful it will cache that response for a predefined time. This means that for that amount of time that particular server will use its cached information and no longer query the authoritive server for any additional requests. That timer is defined in the authoritive DNS server and as long as all the remote servers follow the standards, we can make this work. In order to be successful in this operation you **NEED **to have control over your authoritive DNS server.

The parameter that we are going to tweak is: ** TTL value**: This value used to determine the default (technically, minimum) TTL (time-to-live) for DNS entries, but now is used for negative caching. Here is its definition from RFC 1921:

The default TTL (time-to-live) for resource records – how long data will remain in other nameservers’ cache. ([RFC 1035] defines this to be the minimum value, but servers seem to always implement this as the default value) This is by far the most important timer. Set this as large as is comfortable given how often you update your nameserver. If you plan to make major changes, it’s a good idea to turn this value down temporarily beforehand. Then wait the previous minimum value, make your changes, verify their correctness, and turn this value back up. 1-5 days are typical values. Remember this value can be overridden on individual resource records.”

So as you can see this is not at all a big secret and the RFC even explains what you need to do in such situations… Now I will exemplify this on a small example where I will consider having an authoritive nameserver running BIND9 and the zone domain_to_move.com. In case you are running a different DNS server this should be similar just that the configurations will look different based on the particular DNS server you are using. Let’s say that we have a very simple zone file defined that looks like this (the IPs are private ones just for the exemplification):

; zone 'domain_to_move.com'
$TTL 86400
@       IN      SOA     ns1.domain_to_move.com. hostmaster.domain_to_move.com. (
2006052101     ; Serial
10800          ; Refresh 3 hours
3600           ; Retry 1 hour
604800         ; Expire 1 week
86400          ); Minimum 24 hours

@                       NS      ns1.domain_to_move.com.
@                       NS      ns2.domain_to_move.com.

@                       A       192.168.0.10
@                       MX      10 mail.domain_to_move.com.

; Nameservers
ns1                     A       192.168.0.1
ns2                     A       192.168.0.2
; Mail
mail                    A       192.168.0.10
; Web
www                     CNAME   domain_to_move.com.

As you can see in the first line of the zone file ($TTL 86400) this defines the default TTL for all existing records to 86400 seconds (that means 24 hours). So the first thing that we need to do before starting the actual move is to lower this to a very small value. 60 seconds sounds good. This means that any remote server will not cache the records for more than 1 minute

; zone 'domain_to_move.com'
$TTL 60
@       IN      SOA     ns1.domain_to_move.com. hostmaster.domain_to_move.com. (
2006052102     ; Serial
...

Now we will need to reload the DNS server to activate the new configuration. After this we have to wait for the previous TTL amount of time (here 1 day) in order to be sure that no other remote DNS server has that information in cache. Once that time has passed we can safely proceed with the move and change the actual IPs to point to the new server. I have assumed in this example that the nameservers will remain unchanged, but if you are going to move them also to the new server all you need to do is to be sure that you will configure them the same way. Once the move is over don’t forget to return to some normal TTL value as this will decrease your overall DNS traffic and allow again to have the information properly cached.

Note: in case you have the nameserver hosted on some remote service and don’t have full control of your DNS zone you might not be able to do this and you will be limited to the control panel you will have there. In this case talk with your hosting support to have them lower the TTL for you.

A nice and quick way to check how the remote servers will see your DNS zone and check all the parameters is: http://www.dnsreport.com/

Please feel free to share your experiences in moving to another server. Did you had problems? Was it smooth and without any problems? Share your experience.

comments powered by Disqus