It seems that the Rails deployment dilemma is finally getting the care that it desperately needed to make the whole situation less of a pain in the neck. For a while there, everyone was hanging on the edge or their seats, hoping that Apache developers would fix Apache’s FastCGI interface that had fallen out of maintainence. While waiting for that, many people flocked to Lighttpd as a promising faster/lighter alternative to Apache that seemed to have its FastCGI interface under control.
Meanwhile, development of an alternative to WEBrick was under way, by a guy named Zed Shaw, called Mongrel. It seems Zed just got fed up and decided to change the Rails deployment world with his own bare hands. This is good news for all of us and the best thing about Zed is how much he cares about getting a situation together that works for everyone. (Also, if you ever need help with Mongrel, Zed is always right there with the answer.) So, this seemingly simple little pure HTTP web server has turned out to be much more useful than anticipated. With the introduction of the mongrel_cluster gem, serving Rails applications with a small pack of Mongrel processes and a load balancer is a snap.
Software load balancers that people are using include Pen, Pound, and Apache2’s mod_proxy_balancer. Recently, on the main Rails blog, there was a post about setting up lighttpd with a single proxy to Pound which in turn served up a cluster of Mongrel processes. In reading the post and its comments I realized there seems to be some confusion about where Pound can exist within a typical deployment setup. A few people commented that with Lighttpd in front of Pound, the value of request.remote_ip was 127.0.0.1 (localhost) or something other then the IP of each external request.
There is no reason that Pound can’t sit out in front of Lighttpd, a pack of Mongrels, or any other web servers waiting to process and respond to requests. Because of the way Pound handles headers, the correct value of request.remote_ip is preserved by the time the request is received by Rails. In any case, the Pound docs send the vibe that the intention is to have Pound in front of other servers. Here’s a bit from the latest Pound README that talks about what Pound is and how it can be used:
Pound-2.0.9/README
- a reverse-proxy: it passes requests from client browsers to one or more back-end servers.
- a load balancer: it will distribute the requests from the client browsers among several back-end servers, while keeping session information.
- an SSL wrapper: Pound will decrypt HTTPS requests from client browsers and pass them as plain HTTP to the back-end servers.
- an HTTP/HTTPS sanitizer: Pound will verify requests for correctness and accept only well-formed ones.
- a fail over-server: should a back-end server fail, Pound will take note of the fact and stop passing requests to it until it recovers.
- a request redirector: requests may be distributed among servers according to the requested URL.
It’s number six above that give Pound its flexibility in terms of serving different requests to different back-end web servers. So, on with a simple demo of a Pound setup that passes requests back to a cluster of Mongrels, an Apache server, and a Lighttpd server.
Step 1. Get/Install Pound
Start by downloading the latest version of Pound and unpacking it somewhere nice (like /usr/local/src).
Wait a second… Pound, like many tools that make liberal use of regular expressions, prefers that you have PCRE (Perl Compatible Regular Expression) installed. If you don’t, download and install it with:
$ wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-5.0.tar.gz
$ tar xvzf pcre-5.0.tar.gz
$ cd pcre--5.0
$ ./configure
$ make
$ sudo make install
Next, move into /usr/local/src (or wherever you downloaded Pound) and configure and build with:
$ tar xvzf Pound-2.0.9.tgz
$ cd Pound-2.0.9
$ ./configure \
> --with-ssl=ssl_dir # SSL support, if needed.
$ make
$ sudo make install
Debian users can “apt-get install” Pound but likely won’t get the latest version without some sources.list hackery. If you have installed Pound in Debian, you’ll need to edit the following file and flip the startup bit:
/etc/default/pound
startup=1
On my current system (where this blog lives), I installed Pound (v2.0) with apt-get install Pound and then realized that I wanted the lasted version of Pound (v2.0.9), so I built it from source. But, the nice thing about the Debian package is that it gives you a start-up script (/etc/init.d/pound) which is very handy, especially for a service that should always be up.
So, after installing Pound from source, I ended up with apt’s Pound in /usr/sbin/pound, and my Pound in /usr/local/sbin/pound. To get the start-up script to use the newer Pound, I made this change:
/etc/init.d/pound
#DAEMON=/usr/sbin/pound
DAEMON=/usr/local/sbin/pound
While apt’s Pound stores it configuration file in /etc/pound, the new pound looks for its config info in /usr/local/etc/pound.cfg. To make things work I create a sym link, with:
$ sudo ln -s /etc/pound/pound.cfg /usr/local/etc/pound.cfg
With Pound installed and acting as the ring leader for requests to the various listening web servers, the next step is to configure it. But wait! We need a nice figure that illustrates a deployment plan.
Step 2. Plan Your Deployment Setup
What we want is for Pound to do some request routing for us as well as some load balancing. All incoming requests to blog.tupleshop.com should be sent to a small cluster of two Mongrel processes. Requests for www.tupleshop.com should be sent to Apache running PHP. Finally, any requests for “mov” files should be handled by Lighttpd.
Let’s start by configuring Pound.
Step 3. Configure Pound
The pound configuration file contains three types of directives: global, listener, and service. The global directives in this configuration specify the user and group that the pound service is to run as. The log level states how much logging we want pound to send to syslog, if any. Loglevel takes the following values:
- 0 – for no logging
- 1 – (default) for regular logging
- 2 – for extended logging (show chosen backend server as well)
- 3 – for Apache-like format (Common Log Format with Virtual Host)
- 4 – (same as 3 but without the virtual host information)
The listener directive, ListenHTTP, specifies the IP address and port that Pound is to listen for quests from (you’ll want a real address here).
The remainder of the configuration file contains service directives that define what back end servers are to handle various types of requests. The first Service directive states that anything with a Host header containing www.tupleshop.com should be routed to port 8080 of the localhost address (127.0.0.1). In this case Apache, running PHP (among other things), is listening on port 8080, waiting to handle whatever requests Pound passes to it. (Note: There’s no reason this IP couldn’t be on another physical server, but in this case all three web servers are on the same box.)
The next Service directive uses URL ”..mov”* to match requsts for quicktime movie files. For performance reasons, we want Lighty to handle these requests exclusively. So while where request for http://blog.tupleshop.com would be handled by the Mongrel cluster, a request for http://blog.tupleshop.com/zefrank.mov would never make it to Mongrel and would instead be served by Lighty. The location of .mov files on the server is pretty much irrelevant here—they can be anywhere as long as Lighty knows where to find them.
The finial Service directive effectively serves as a catch-all because it’s the last one in the file, and because there is no URL or Header matching criteria defined. This is the one doing actual load balancing to the Mongrel processes. In this case there are two Mongrel processes listening on ports 9000 and 9001, on the local IP address.
/etc/pound/pound.cfg
User "www-data"
Group "www-data"
LogLevel 2
Alive 30
ListenHTTP
Address 123.123.123.123
Port 80
End
Service
HeadRequire "Host:.*www.tupleshop.com.*"
BackEnd
Address 127.0.0.1
Port 8080
End
Session
Type BASIC
TTL 300
End
End
Service
URL ".*.mov"
BackEnd
Address 69.12.146.109
Port 8081
End
Session
Type BASIC
TTL 300
End
End
Service
# Catch All
BackEnd
Address 127.0.0.1
Port 9000
End
BackEnd
Address 127.0.0.1
Port 9001
End
Session
Type BASIC
TTL 300
End
End
Okay, with Pound all configured, we can start the service with:
$ sudo /etc/init.d/pound start
If there’s a problem with your configuration file, pound won’t say much about it to STDERR, so it’s a good idea to be watching /var/log/syslog as you start Pound until you’re confident that you configuration is solid.
None of the services that Pound directs requests to have to be running when you start Pound. But if they aren’t, you’ll get HTTP 503 errors from requests bound for servers that aren’t running or are improperly configured. One way to look at the Pound configuration file is as a specification for how the rest of your services should be set up. If you forget what port a server should listen on, always refer back to Pounds config file.
Tracking down problem with so many web servers running can get a little hairy, but if you stay organized and are methodical about your setup (like knowing where each server logs events), it shouldn’t be too bad at all.
This post is already too long so I’m not going to get into configuring a Mongrel cluster, Lighttpd, or Apache. Instead, I’ll just include my config files for reference.
Step 4. Configure the Rest of Your Servers
First, my mongrel_cluster config file.
/var/www/robblelog/config/mongrel_cluster.yml
---
user: mongrel
cwd: /var/www/robblelog
port: "9000"
environment: production
group: www-data
pid_file: log/mongrel.pid
servers: 2
which I start with:
$ sudo mongrel_rails cluster::start
A slicker way to handle this is to copy the included mongrel_cluster start-up file to you system’s initialization scripts directory so your Mongrels will survive a server reboot.
Next, is my lighty config file. It’s pretty simple with the document root pointing to the public directory of the Rails project: robblelog.
/etc/lighttpd/lighttpd.conf
server.modules = (
"mod_access",
"mod_alias",
"mod_accesslog",
)
server.port = 8081
server.bind = "127.0.0.1"
server.document-root = "/var/www/robblelog/public/"
server.username = "www-data"
server.groupname = "www-data"
server.pid-file = "/var/run/lighttpd.pid"
server.errorlog = "/var/log/lighttpd/error.log"
index-file.names = ( "index.php", "index.html",
"index.htm", "default.htm" )
accesslog.filename = "/var/log/lighttpd/access.log"
## mimetype mapping
include_shell "/usr/share/lighttpd/create-mime.assign.pl"
Finially, a small chunk of my Apache2 configuration:
Listen 8080
NameVirtualHost *:8080
<VirtualHost *:8080>
ServerAdmin admin@tupleshop.com
ServerName www.orsini.us
ServerAlias www.orsini.us
ServerAlias orsini.us
DocumentRoot /var/www/tupleshop.com
# ...
</VirtualHost>
Appendix A: Debugging
If you’re used to only running a single web server on your system it may be a little daunting to have more servers, all listeing on different ports. How can you know what is up and running a which ports are available? Install and run nmap. Use the following command to display what services are listenting on different ports:
$ sudo nmap -sT -O localhost
To see what Internet services are currently tied up, use the following lsof command.
$ lsof -i -P
Finially, Pound logs to /var/log/syslog, and Mongrel, Apache, and Lighttpd all have their own logging configurations. Between network inspection and watching your logs, you should be able to naild down most configuration issues.
Of of the most obvious tweaks you can make to the Mongrel cluster is to specify more or less Mongrel processes to run. You have to play with this number based on your anticipated traffic load and your available system resoources (mostly RAM). The standard tool for measureing how well any of your servers are performing is httperf. Here’s a example that blasts port 8080 with 100 requests:
$ httperf --port 8080 --server 127.0.0.1 \
> --num-conn 100 --timeout 5
The number you want to dig out of the output of httperf is req/s (requests per second). Of course, a higher number is better.
Appendix B: Logging Remote IP Addresses
One problem that is a show stopper for many people who might otherwise put their web servers behind Pound is the issues of access logging not preserving the IP address of the original request. Instead it shows up as 127.0.0.1.
Luckily, of the very few modifications Pound makes to requests, it adds an X-Forwarded-For header containing the IP address of the original request. The general format is:
X-Forwarded-for: client-IP-address
Note that other proxies my already have added an X-Forwarded-for header (there can be more then one, as allowed by the HTTP RFC’s). In this case, Pound adds its own X-Forwarded-for header, last, after the others.
To capture the IP address from this header in your Apache common log format, replace “h” (the remote host format directive) with \”{X-Forwarded-for}i\”. The whole format definition:
LogFormat "\"%{X-Forwarded-for}i\" %l %u %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\"" combined
Appendix C: More Logging (Extra Credit)
Another “interesting” solution (more to demonstrate an advanced customization option) to this is to have Pound add an additional header to each request, called something like “REAL_REMOTE_ADDR”. This can be done easily by recompiling Pound with a small addition to the source. Don’t worry, you don’t have to be a C Guru for this. It’s very simple. The following excerpt from Pound’s http.c shows where you want to add the one line that adds the “REAL_REMOTE_ADDR” header.
/usr/local/src/Pound-2.0.9/http.c (~line 850)
/* put additional client IP header */
BIO_printf(be, "X-Forwarded-For: %s\r\n", inet_ntoa(from_host));
BIO_printf(be, "REAL_REMOTE_ADDR: %s\r\n", inet_ntoa(from_host));
Save this file with change you made and recompile Pound, with:
$ cd /usr/local/src/Pound-2.0.9
$ ./configure
$ make
$ sudo make install
Now that the new header is being added to each request, you have to alter the log file format for each web server to acknollage the additional field. This is also pretty simple to do. In the case of Apache, you just make a small change to the definition of log file format you’re using. I juse the “combined” format and here is a definition that replaces “%h” with our custom header string.
LogFormat "\"%{REAL_REMOTE_ADDR}i\" %l %u %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\"" combined
The result is that your access logs should apeear just as they would if Apache was receiving external requests directly.
So, that’s it. Good luck and let me know if I missed anything, in the comments.