The Address Resolution Protocol (ARP) is the method for finding a host’s link layer (hardware) address when only its Internet Layer (IP) or some other Network Layer address is known. ARP is a Link Layer protocol (Layer 2) because it only operates on the local area network or point-to-point link that a host is connected to. When we migrate one IP from a machine to another one, we might have problems caused by ‘arp caching'. Various devices will cache the arp information for a specified amount of time and even after we moved the IP this will not be seen by some devices that will still use the cached information. I am talking about directly connected switches or routers, that we might have control or maybe not. If we have control on all the external devices, normally we just connect to the router or switch and remove the arp entry, forcing the device to query again for the information. This post will try to help in the situation where we don’t have direct control on the external devices (we are collocated or use rented servers in a remote datacenter, etc.), to minimize the downtime associated with this type of IP migration.
It is quite frequent to use separate IPs for various services on the same machine, and move those IPs to another server if needed. These are sometimes called portable IPs that can be migrated to any server in a particular colo/lan. This is done normally to minimized downtime and keep maintenance of such operations minimal (and to not rely on dns changes). Still arp caching on various network devices can cause big problems. Let’s assume we moved the IP from one server to another one in the same LAN to move away some service from our main web server. Taking down the IP from the existing server and bringing it up on the new server will complete our direct work if we don’t have access on the switches/routers in front of us. Again if you have control on all devices just connect to them and delete the arp cache for this ip to allow it to be re-cached on the new machine.
So after we have the IP moved on the new machine and now have to wait… The arp cache depends on the actual devices and can be anything from 5 minutes to not expire. Let’s assume for this example that the ip is 192.168.0.101 and after we run ifup and ifdown we have the IP correctly showing on the new server, srv02. If we don’t want to wait helpless for this to happen automatically, the solution is to broadcast from our new machine the arp with the source of the IP. Hopefully this will make the remote device to verify and invalidate its existing cache entry. For this we can use arping; installation is simple as it should be in most modern linux distributions by default. On debian you would install arping just by running:
aptitude install arping
finally we use a command like:
arping -S <our_IP> -B
that will broadcast our source IP and direct it to the broadcast address (255.255.255.255) . If your arp command uses different parameters notations, you should looks for something similar (to set the source and ping the broadcast). In our example with the IP 192.168.0.101 we would use:
srv02:~# arping -S 192.168.0.101 -B ARPING 255.255.255.255 --- 255.255.255.255 statistics --- 16 packets transmitted, 0 packets received, 100% unanswered
(stop it with CTRL-C once it is working).
Normally after this, all should be ok and the remote device should cache the new arp entry, invalidating the existing cached one. If this is not the case, then call your datacenter to minimize the downtime ;-) . I would always suggest testing this first and seeing what downtime to expect and if you can minimize it like this with arping, first try with a non-production test IP. Don’t do this with live, production IP/service until you know what to expect. I hope this post will help you if you will have to deal with a similar situation. If this doesn’t work as expected in your case, please let us know what devices you found problematic, and if you were able to use a different workaround.