Sunday, April 27, 2008

IP takover, IP failover, how it works ...

I've been interested lately in ip failover of Linux/FreeBSD machines, and been taking a look from time to time at HA linux webpage, frankly i didn't find any document how it is done "the right way" or "how it is usually done" ... probably this is due to the fact that i usually pick up the wrong keywords to google for.

Anyway, i decided ill try to understand how it works and maybe implement something that does it, i had the theory that it should work using a ping test and ifconfig for an alias ip on the interfaces that i want them to failover for each other, and i might to reload some services with new configs, anyway, i dont really know, and i decided to start looking at how this could be done, and as usual writing a post over the span of multiple days ... or weeks :)

The quick search on google brought this page which a little old (written in 2003) but made me feel good since it very much described what i thought is the idea behind ip takeover, but with some very important additions, mainly send_arp and that there must be a mechanism to bring things to normal when the other ip starts to respond again.

Fiddling in the net for a while, i found a very nice article/document, probably the best i found so far, you can see it on here on linux-ha.org, it mentions a software called fake, so i quickly searched the packages provided by ubuntu, and found package fake, i installed it (apt-get install fake) and looked up where it resides, i found it in /usr/sbin/fake and it turns out to be a bash script! (YAY) ... having a quick look at it shows that its using the above mentioned send_arp ... its nice when you find consistent information on the net ;)

So the basic idea is that you have an ip you wish to keep up, lets say IP 192.168.1.1 ... and you have two boxes, A and B, each will have an IP, lets say A has 192.168.1.10 and B has 192.168.1.20 , what you do is that each box monitors the other box (lets say via ping or arping ..etc), Initially you have 192.168.1.1 on box A along side with 192.168.1.10 , whenever box B stops getting response from box A (on ip 192.168.1.10), it runs ifconfig to acquire 192.168.1.1 ! and additioally any other desired scripts and services, whenever 192.168.1.10 starts responding again (box A is up again, and its the default machine for 192.168.1.1), box B brings 192.168.1.1 down in order to allow box A to respond to requests again ... tada !

No comments: