Thomas Christory's Blog When Cisco, VMware and Linux are your everyday


Your ideal CCIE R&S rack ?

Hello folks,

I've been given the opportunity to build a R&S with quasi unlimited budget.
So here is the question, what would you guys put in it ?
Let's rock the comments !

Tags: , , , ,

Cisco Contest #1

Because it's fun, here is a quick contest when you have to find the difference between the same show command, and find why this output is not twice the same :

First :

SW3#sh int po 2 tru
Port        Mode             Encapsulation  Status        Native vlan
Po2         on               n-802.1q       trunking      1
Port        Vlans allowed on trunk
Po2         1-4094
Port        Vlans allowed and active in management domain
Po2         1,100,200,300
Port        Vlans in spanning tree forwarding state and not pruned
Po2         none

Second :

SW3#sh int po 2 tru         
Port        Mode             Encapsulation  Status        Native vlan
Po2         on               802.1q         trunking      1
Port        Vlans allowed on trunk
Po2         1-4094
Port        Vlans allowed and active in management domain
Po2         1,100,200,300
Port        Vlans in spanning tree forwarding state and not pruned
Po2         none

Don't focus on the fact that it is an etherchannel, it's not the subject here.

So, what is different, and -why- ?

Have Fun,


Tags: , , ,

Building an easy and scalable load-balanced high-availability web-hosting solution. Part One : The front

I was recently asked for my job to set-up a high-availability solution to make sure our websites and those we are hosting will never be down (or the fewest time possible), plus, we wanted to put some load balancing on top of that.

Our hosting architecture was dead simple : A front server, with the websites contents, and a Back-end server with the MySQL server.
What we were aiming at : Two front servers, Two back-end Servers. (We could have also done that with 3/4/5/6 servers and so forth)

So what I needed was :

  • Four servers
  • A load-balancer :
  • A way to detect if the load-balancer goes down and be able replace it immediately and automatically :
  • A way to replicate the MySQL Server and replace it immediately and automatically if it fails : MySQL replication (+ Heartbeat)
  • A way to sync data uploaded by users between X servers (in our case, small files like pdf, images, etc) : Inotify + Rsync
  • A bunch of spare IP addresses that can travel from one server to another
  • Google
  • Optionally a way to have your Vhosts files managed from another place and be able to download them onto your servers automatically (think puppet here).

In this article I will focus on the front part of the job and leave the MySQL-back-end part for another time/article.

1/ First : Basic checks

I started this project with one of my front server already in production, so I had to be careful with what I was doing. If you start from scratch you might not have to check everything three times to be sure you're not going to destroy something. (Backups is the master keyword here)

So we have two servers, Front1 and Front2.

Here is the list of things that need to be done :

  1. Same configuration on each side, it does ease the whole process.
  2. Same data on each side, you need to serve the same content otherwise the whole thing is useless. We do RubyOnRails websites so we use Capistrano to deploy our applications, this way we just had to add another server to our deployment file. Rsync works here too. Be sure to check the permissions if you are making any kind of archive.
  3. Same Vhosts files on each side. Modifying/adding Vhosts files is not fun as you have to log on every server to do so. I would recommend to use a tool like puppet to easily deploy or modify them, as you can push them really quick and have puppet restart the web-servers automatically.
  4. Keep your files sync'ed, if your users can upload files on your servers, you need to have them on both(all) servers. There are plenty of ways to do so, from an automatic Rsync (bad), to a remote shared storage(think iscsi+ocfs+dm-multipath)(better). We currently use a custom script with Inotify + Rsync. I won't go into details as it will be the subject of another article sometime soon.

This was the easy part that you should be able to do this quickly.

Before going on, let me throw some IPs in the mix :

  • Front1 will be
  • Front2 will be
  • And we will have a special IP, for the load-balancer that will be HAP for the DNS and for the number.

2/ Next : The load-balancer, HAProxy

One of the goal of this architecture was being able to split the load between the servers, and one of the best software solution for this is HAProxy.

You might be telling yourself that for a proper use, we would need a third server to act as a (and a fourth to back-it-up). Under a constant heavy load that might be true, but for a lot a companies, that is not.

We will instead use one of our front servers as the load-balancer, the other one being the backup one. This way, is both servers are up, Front1 is the load balancer and redirect connections to himself and Front2, if it dies, Front2 become the load-balancer and redirect to himself and Front1 (which is dead, so that is silly, but if we had three or more servers, it would make sense.)

Here is a picture of the setup :


Note : Sorry for the lame graphic, I'm terrible at drawing !

As I mentioned earlier, there is a third IP that will be floating around the servers and will be used to point the A records of our websites so that no matter what, those records will always be valid as the IP will(should) always answer any connection. My servers are rented @ OVH, and those flying IPs are called FailOverIp which I will use to name it.

So what we have to do for now is keeping the ip on the side, will see later how to have it attached automatically to the "master" server.

I'm using Debian so to install HAProxy, a simple "aptitude install haproxy" is enough.

We now have to stop and think a little. Using one server for the proxy and the web-server instead of two has a little catch. HAProxy and your web-server can't both listen on port 80, and you sure don't want your customers to have to remember another port. What we are going to do instead is moving the ports of the web-server to something else like 5080.

This is quite simple to do, in for example you go from listen 80; to listen 5080;

There is another trick we will be using, as we are cool admins. If we have to perform scheduled maintenance on one of the servers, we don't want to kick the users from it, and have them restart their session.

What we are going to do instead is blocking new connections, but let the old ones finish. For that purpose we will have a main server, and a backup one. the backup one accepting no new connections, but managing the already opened ones. In fact the main server and the backup one will be the exact same machine. We will be playing with iptables to have HAProxy thinks it's 2 servers.

So from a HAProxy point of view, there will be 4 servers, 2 main, 2 backup, but really, only 2.

Here is a commented configuration file for HAProxy :

front1:~# cat /etc/haproxy/haproxy.cfg
 log    local0
 #log    local1 notice
 log    syslog debug
 #log loghost    local0 info
 maxconn 4096
 #chroot /usr/share/haproxy
 user haproxy
 group haproxy
 log    global
 mode    http
 option    httplog
 option    dontlognull
 retries    3
 option redispatch
 maxconn    2000
 contimeout    5000
 clitimeout    50000
 srvtimeout    50000
 stats enable
 stats scope    .
 ## here you can define a custom url for the stats
 stats uri     /haproxy?stats
 ## and a custom user-name and password
 stats auth     USER:PASS           
 ## you can have haproxy to listen on every address ( or a specific one, plus you name your "cluster" (WebFarm here)
listen WebFarm
 mode http
 ## we want every new connexion to be balanced between the servers
 balance    roundrobin
 ## we insert a cookie  to memorize where the previous connections were send to that client
 cookie SERVERID insert indirect
 ## you forward the real ip address of the client to the web-server (useful for logs)
 option forwardfor
 ## this is the file your HAProxy is going to poll to see if your web-server is still alive.
 option httpchk HEAD /check.txt HTTP/1.0         
 ## here is the fun : We have to block, the front one and the back one
 ## the front block is the main servers
 ## you can see two cookies, to differentiate each server, and the check that is done on port 2381 every 2 seconds
 server  Front1 cookie A check port 5381 inter 2000
 server  Front12 cookie B check port 5381 inter 2000
 ## here are the backup servers, as you can see we use the same cookies. Same check as the front but on port 5380.
 ## we added the word backup to specify that HAProxy must use these only is the main servers are down. A trick we use with iptables.
 server  Front1bck cookie A check port 5380 inter 2000 backup
 server  Front2bck cookie B check port 5380 inter 2000 backup
 #errorloc    502
 #errorfile    503    /etc/haproxy/errors/503.http
 errorfile    400    /etc/haproxy/errors/400.http
 errorfile    403    /etc/haproxy/errors/403.http
 errorfile    408    /etc/haproxy/errors/408.http
 errorfile    500    /etc/haproxy/errors/500.http
 errorfile    502    /etc/haproxy/errors/502.http
 errorfile    503    /etc/haproxy/errors/503.http
 errorfile    504    /etc/haproxy/errors/504.http

As you can see on my configuration file, the checks for the main servers are made on port 5381 but the servers listen on 5380. The trick is that we add a little iptables rules to redirect everything to the port 5381 to the port 5380 so that when we want the front to serve request, it does. But if we want to take it down, we remove the rule, letting HAProxy thinking it's down and forwarding the request to the backup, which is in fact the same server. Then we wait a little to let the users finish their connections and we can take the server down properly. When everything that needed to be fixed is done, we re-add the rule and voila ! it's working.

Here are the rules :

iptables -t nat -A OUTPUT -d -p tcp --dport 5381 -j DNAT --to-dest :5380
iptables -t nat -A OUTPUT -d -p tcp --dport 5381 -j DNAT --to-dest :5380

But as you may know, if you reboot your server, your iptables rules will be lost. So I created a script that is launched every time the is started/restarted :

iptables -t nat -D OUTPUT -d -p tcp --dport 5381 -j DNAT --to-dest :5380
iptables -t nat -D OUTPUT -d -p tcp --dport 5381 -j DNAT --to-dest :5380
iptables -t nat -A OUTPUT -d -p tcp --dport 5381 -j DNAT --to-dest :5380
iptables -t nat -A OUTPUT -d -p tcp --dport 5381 -j DNAT --to-dest :5380

The line with -D first remove anything that can still be here from a previous launch of the script otherwise you'll see your rules ten times in a iptables -L -t nat, which I think might cause delay.

we called it and added the following line to /etc/network/interfaces (on Debian) right after the ip configuration of your interface:

post-up /bin/bash /etc/haproxy/

Now, if we want to take down a server, we have two scripts, one per server :

iptables -t nat -D OUTPUT -d -p tcp --dport 2381 -j DNAT --to-dest :2380

iptables -t nat -D OUTPUT -d -p tcp --dport 2381 -j DNAT --to-dest :2380

When all the fixes are finished, we just run the script and we are good to go.

That's pretty much it regarding HAProxy, I'm sure we can tweak the config file a little more, but that's working for us as it is.

Also, be sure to apply those configs on BOTH your servers, as we want any of them to be able to take the proxy role at anytime. (Here again, puppet is the way to go)

One quick note before continuing to the High-Availability part of this article.

We are using Nginx as webserver, and one thing we didn't do the first time was modifying the way Nginx was logging things. We quickly realized that my log files only contained the ip of the active proxy server, which was normal because, well it's a proxy. It's easy to modify the way nginx save logs. Open the file /etc/nginx.conf and add/modify :

# configure log format
 log_format main '$http_x_forwarded_for - $remote_user [$time_local] '
 '"$request" $status  $body_bytes_sent "$http_referer" '
 '"$http_user_agent" "$remote_addr"';
 access_log  /var/log/nginx/access.log main;

So that the IP address of the client will be logged at first, and at the end of the line we keep the proxy address just for logging purpose.

Another thing you might to add to you Nginx (or apache) configuration, is a rule that prevent the logging of the check.txt, as HAProxy is going to access it every two seconds. This is a quick rule you have to add in your Vhost(s) file under the server section :

location = /check.txt {
 root /whatever/is/your/www/root;
 access_log off;
 expires 30d;

Before going on with the third part, you could test your HAProxy setup by attaching the FailOverIp to one of the server and test that everything is working, then attach the IP on the other server and test it too.

3/ The High-Availability : Heartbeat

Now comes the tricky stuff, the high-availability. We want to have our mini-cluster (that can easily be extended) to be always up, no matter what happens to any machine. In the mean time, we don't have a very big budget so we want to keep things at minimal, that's why we don't have an external load-balancer because we would need two for the HA.

What we're going to use to make the HA possible, is Heartbeat from the Linux-HA suite. Heartbeat can be difficult to configure, but I summed up different things to make that easier:

First, install heartbeat :

aptitude install heartbeat

Then we need to adapt/create some configuration files : :

# keepalive: how many seconds between heartbeats
keepalive 2
# deadtime: seconds-to-declare-host-dead
deadtime 10
# What UDP port to use for udp or ppp-udp communication ?
udpport        694
ucast eth0 # ip address of the HA peer, don't forget to edit that on each server
# What interfaces to heartbeat over?
udp     eth0
# Facility to use for syslog()/logger (alternative to log/debugfile)
logfacility     local0
# Tell what machines are in the cluster
# node    nodename ...    -- must match uname -n
node    front1
node    front2
# If on, the master get the control once it's back, you might want to check the errors before allowing that, so let it off
auto_failback off

Note : If you set auto_failback to off, your master server won't become master automatically when it restart, you will have to restart the actual Heartbeat master so that the original one could get it's status back.

authkeys :

auth 3
3 md5 yourubersecretkey


#Here you specify the uname -n of your master heartbeat server,
#You specify what to do when it goes down,
#Here we use a pre-made script that will that the FailOverIp and assign it to itself.
Front1 IPaddr::

That last configuration files does all the magic regarding the FailOverIP, it uses a script that comes with the Heartbeat install that, when something happens, like a peer down or up, configure the IP on the server or take it down automatically. This way if a server crashes in the night, it does takeover by itself and nothing goes down more than a few seconds.

And that's it for heartbeat, hopefully I didn't forget anything and you're good to go. You can now test your setup by shutting down the master Heartbeat and watch the second one get the FailOverIp address. Try this with a ping and you should see a very short downtime.

4/ Conclusion

That is it for the first part of this series of articles.
We are now have a load-balanced front-end that is able to survive the death of one of the servers.
Next time we will see how to effectively share data between those servers, and how to apply the same load-balanced/high-availability concept with the back-end as in Mysql-back-end.

I hope you enjoyed that article, feel free to comment/ask questions !

Sources :

Here's is a list of websites I used while designing the whole thing :

Guiguiabloc's blog




Tags: , , , , , , ,

Welcome !

Hi All,

Welcome on my new blog focused on , systems , , and stuffs. :)

See you soon.


Tags: , , , , ,