This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
One Plain and One mod_perl-Enabled Apache Server
|
407
If you are new to mod_perl, this is probably the best way to get yourself started.
And of course, if your site is serving only mod_perl scripts (and close to zero static
objects), this might be the perfect choice for you!
Before trying the more advanced setup techniques we are going to talk about now,
it’s probably a good idea to review the simpler straightforward installation and con-
figuration techniques covered in Chapters 3 and 4. These will get you started with
the standard deployment discussed here.
One Plain and One mod_perl-Enabled
Apache Server
As mentioned earlier, when running scripts under mod_perl you will notice that the
httpd processes consume a huge amount of virtual memory—from 5 MB–15 MB,
and sometimes even more. That is the price you pay for the enormous speed
improvements under mod_perl, mainly because the code is compiled once and needs
to be cached for later reuse. But in fact less memory is used if memory sharing takes
place. Chapter 14 covers this issue extensively.
Using these large processes to serve static objects such as images and HTML docu-
ments is overkill. A better approach is to run two servers: a very light, plain Apache
server to serve static objects and a heavier, mod_perl-enabled Apache server to serve
requests for dynamically generated objects. From here on, we will refer to these two
servers as httpd_docs (vanilla Apache) and httpd_perl (mod_perl-enabled Apache).
This approach is depicted in Figure 12-2.
The advantages of this setup are:
• The heavy mod_perl processes serve only dynamic requests, so fewer of these
large servers are deployed.
•
MaxClients, MaxRequestsPerChild, and related parameters can now be optimally
tuned for both the httpd_docs and httpd_perl servers (something we could not do
before). This allows us to fine-tune the memory usage and get better server per-
formance.
Now we can run many lightweight httpd_docs servers and just a few heavy
httpd_perl servers.
The disadvantages are:
• The need for two configuration files, two sets of controlling scripts (startup/
shutdown), and watchdogs.
• If you are processing log files, you will probably have to merge the two separate
log files into one before processing them.
,ch12.24057 Page 407 Thursday, November 18, 2004 12:41 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
408
|
Chapter 12: Server Setup Strategies
• Just as in the one-server approach, we still have the problem of a mod_perl pro-
cess spending its precious time serving slow clients when the processing portion
of the request was completed a long time ago. (Deploying a proxy, covered in
the next section, solves this problem.)
As with the single-server approach, this is not a major disadvantage if you are on
a fast network (i.e., an Intranet). It is likely that you do not want a buffering
server in this case.
Note that when a user browses static pages and the base URL in the browser’s loca-
tion window points to the static server (for example http://www.example.com/index.
html), all relative URLs (e.g.,
<a href="/main/download.html">) are being served by
the plain Apache server. But this is not the case with dynamically generated pages.
For example, when the base URL in the location window points to the dynamic
server (e.g., http://www.example.com:8000/perl/index.pl), all relative URLs in the
dynamically generated HTML will be served by heavy mod_perl processes.
You must use fully qualified URLs, not relative ones. http://www.example.com/icons/
arrow.gif is a full URL, while /icons/arrow.gif is a relative one. Using
<base
href="http://www.example.com/">
in the generated HTML is another way to handle
this problem. Also, the httpd_perl server could rewrite the requests back to httpd_
docs (much slower) and you still need the attention of the heavy servers.
This is not an issue if you hide the internal port implementations, so the client sees
only one server running on port 80, as explained later in this chapter.
Figure 12-2. Standalone and mod_perl-enabled Apache servers
Clients
Response
Request
Response
Request
Static object
Dynamic object
httpd_docs
Apache
example.com:80
httpd_perl
Apache and mod_perl
example.com:8000
,ch12.24057 Page 408 Thursday, November 18, 2004 12:41 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
One Plain and One mod_perl-Enabled Apache Server
|
409
Choosing the Target Installation Directories Layout
If you’re going to run two Apache servers, you’ll need two complete (and different)
sets of configuration, log, and other files. In this scenario we’ll use a dedicated root
directory for each server, which is a personal choice. You can choose to have both
servers living under the same root, but this may cause problems since it requires a
slightly more complicated configuration. This decision would allow you to share
some directories, such as include (which contains Apache headers), but this can
become a problem later, if you decide to upgrade one server but not the other. You
will have to solve the problem then, so why not avoid it in the first place?
First let’s prepare the sources. We will assume that all the sources go into the /home/
stas/src directory. Since you will probably want to tune each copy of Apache sepa-
rately, it is better to use two separate copies of the Apache source for this configura-
tion. For example, you might want only the httpd_docs server to be built with the
mod_rewrite module.
Having two independent source trees will prove helpful unless you use dynamically
shared objects (covered later in this chapter).
Make two subdirectories:
panic% mkdir /home/stas/src/httpd_docs
panic% mkdir /home/stas/src/httpd_perl
Next, put the Apache source into the /home/stas/src/httpd_docs directory (replace 1.3.x
with the version of Apache that you have downloaded):
panic% cd /home/stas/src/httpd_docs
panic% tar xvzf ~/src/apache_1.3.x.tar.gz
Now prepare the httpd_perl server sources:
panic% cd /home/stas/src/httpd_perl
panic% tar xvzf ~/src/apache_1.3.x.tar.gz
panic% tar xvzf ~/src/modperl-1.xx.tar.gz
panic% ls -l
drwxr-xr-x 8 stas stas 2048 Apr 29 17:38 apache_1.3.x/
drwxr-xr-x 8 stas stas 2048 Apr 29 17:38 modperl-1.xx/
We are going to use a default Apache directory layout and place each server direc-
tory under its dedicated directory. The two directories are:
/home/httpd/httpd_perl/
/home/httpd/httpd_docs/
We are using the user httpd, belonging to the group httpd, for the web server. If you
don’t have this user and group created yet, add them and make sure you have the
correct permissions to be able to work in the /home/httpd directory.
,ch12.24057 Page 409 Thursday, November 18, 2004 12:41 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
410
|
Chapter 12: Server Setup Strategies
Configuration and Compilation of the Sources
Now we proceed to configure and compile the sources using the directory layout we
have just described.
Building the httpd_docs server
The first step is to configure the source:
panic% cd /home/stas/src/httpd_docs/apache_1.3.x
panic% ./configure prefix=/home/httpd/httpd_docs \
enable-module=rewrite enable-module=proxy
We need the mod_rewrite and mod_proxy modules, as we will see later, so we tell
./configure to build them in.
You might also want to add layout, to see the resulting directories’ layout without
actually running the configuration process.
Next, compile and install the source:
panic% make
panic# make install
Rename httpd to httpd_docs:
panic% mv /home/httpd/httpd_docs/bin/httpd \
/home/httpd/httpd_docs/bin/httpd_docs
Now modify the apachectl utility to point to the renamed httpd via your favorite text
editor or by using Perl:
panic% perl -pi -e 's|bin/httpd|bin/httpd_docs|' \
/home/httpd/httpd_docs/bin/apachectl
Another approach would be to use the target option while configuring the source,
which makes the last two commands unnecessary.
panic% ./configure prefix=/home/httpd/httpd_docs \
target=httpd_docs \
enable-module=rewrite enable-module=proxy
panic% make
panic# make install
Since we told ./configure that we want the executable to be called httpd_docs (via
target=httpd_docs), it performs all the naming adjustments for us.
The only thing that you might find unusual is that apachectl will now be called
httpd_docsctl and the configuration file httpd.conf will now be called httpd_docs.conf.
We will leave the decision making about the preferred configuration and installation
method to the reader. In the rest of this guide we will continue using the regular
names that result from using the standard configuration and the manual executable
name adjustment, as described at the beginning of this section.
,ch12.24057 Page 410 Thursday, November 18, 2004 12:41 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
One Plain and One mod_perl-Enabled Apache Server
|
411
Building the httpd_perl server
Now we proceed with the source configuration and installation of the httpd_perl
server.
panic% cd /home/stas/src/httpd_perl/mod_perl-1.xx
panic% perl Makefile.PL \
APACHE_SRC= /apache_1.3.x/src \
DO_HTTPD=1 USE_APACI=1 EVERYTHING=1 \
APACHE_PREFIX=/home/httpd/httpd_perl \
APACI_ARGS=' prefix=/home/httpd/httpd_perl'
If you need to pass any other configuration options to Apache’s ./configure, add them
after the prefix option. For example:
APACI_ARGS=' prefix=/home/httpd/httpd_perl \
enable-module=status'
Notice that just like in the httpd_docs configuration, you can use target=httpd_perl.
Note that this option has to be the very last argument in
APACI_ARGS; otherwise make
test tries to run httpd_perl, which fails.
Now build, test, and install httpd_perl.
panic% make && make test
panic# make install
Upon installation, Apache puts a stripped version of httpd at /home/httpd/httpd_perl/
bin/httpd. The original version, which includes debugging symbols (if you need to run
a debugger on this executable), is located at /home/stas/src/httpd_perl/apache_1.3.x/
src/httpd.
Now rename httpd to httpd_perl:
panic% mv /home/httpd/httpd_perl/bin/httpd \
/home/httpd/httpd_perl/bin/httpd_perl
and update the apachectl utility to drive the renamed httpd:
panic% perl -p -i -e 's|bin/httpd|bin/httpd_perl|' \
/home/httpd/httpd_perl/bin/apachectl
Configuration of the Servers
When we have completed the build process, the last stage before running the servers
is to configure them.
Basic httpd_docs server configuration
Configuring the httpd_docs server is a very easy task. Open /home/httpd/httpd_docs/
conf/httpd.conf in your favorite text editor and configure it as you usually would.
,ch12.24057 Page 411 Thursday, November 18, 2004 12:41 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
412
|
Chapter 12: Server Setup Strategies
Now you can start the server with:
/home/httpd/httpd_docs/bin/apachectl start
Basic httpd_perl server configuration
Now we edit the /home/httpd/httpd_perl/conf/httpd.conf file. The first thing to do is to
set a
Port directive—it should be different from that used by the plain Apache server
(Port 80), since we cannot bind two servers to the same port number on the same IP
address. Here we will use 8000. Some developers use port 81, but you can bind to
ports below 1024 only if the server has root permissions. Also, if you are running on a
multiuser machine, there is a chance that someone already uses that port, or will start
using it in the future, which could cause problems. If you are the only user on your
machine, you can pick any unused port number, but be aware that many organiza-
tions use firewalls that may block some of the ports, so port number choice can be a
controversial topic. Popular port numbers include 80, 81, 8000, and 8080. In a two-
server scenario, you can hide the nonstandard port number from firewalls and users
by using either mod_proxy’s
ProxyPass directive or a proxy server such as Squid.
Now we proceed to the mod_perl-specific directives. It’s a good idea to add them all
at the end of httpd.conf, since you are going to fiddle with them a lot in the early
stages.
First, you need to specify where all the mod_perl scripts will be located. Add the fol-
lowing configuration directive:
# mod_perl scripts will be called from
Alias /perl /home/httpd/httpd_perl/perl
From now on, all requests for URIs starting with /perl will be executed under mod_
perl and will be mapped to the files in the directory /home/httpd/httpd_perl/perl.
Now configure the /perl location:
PerlModule Apache::Registry
<Location /perl>
#AllowOverride None
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
PerlSendHeader On
Allow from all
</Location>
This configuration causes any script that is called with a path prefixed with /perl to
be executed under the
Apache::Registry module and as a CGI script (hence the
ExecCGI—if you omit this option, the script will be printed to the user’s browser as
plain text or will possibly trigger a “Save As” window).
This is only a very basic configuration. Chapter 4 covers the rest of the details.
,ch12.24057 Page 412 Thursday, November 18, 2004 12:41 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Adding a Proxy Server in httpd Accelerator Mode
|
413
Once the configuration is complete, it’s a time to start the server with:
/home/httpd/httpd_perl/bin/apachectl start
One Light Non-Apache and One mod_perl-
Enabled Apache Server
If the only requirement from the light server is for it to serve static objects, you can
get away with non-Apache servers, which have an even smaller memory footprint
and even better speed. Most of these servers don’t have the configurability and flexi-
bility provided by the Apache web server, but if those aren’t required, you might
consider using one of these alternatives as a server for static objects. To accomplish
this, simply replace the Apache web server that was serving the static objects with
another server of your choice.
Among the small memory–footprint and fast-speed servers, thttpd is one of the best
choices. It runs as a multithreaded single process and consumes about 250K of mem-
ory. You can find more information about this server at http://www.acme.com/
software/thttpd/. This site also includes a very interesting web server performance
comparison chart (http://www.acme.com/software/thttpd/benchmarks.html).
Another good choice is the kHTTPd web server for Linux. kHTTPd is different from
other web servers in that it runs from within the Linux kernel as a module (device-
driver). kHTTPd handles only static (file-based) web pages; it passes all requests for
non-static information to a regular user space web server such as Apache. For more
information, see http://www.fenrus.demon.nl/.
Boa is yet another very fast web server, whose primary design goals are speed and
security. According to http://www.boa.org/, Boa is capable of handling several thou-
sand hits per second on a 300-MHz Pentium and dozens of hits per second on a
lowly 20-MHz 386/SX.
Adding a Proxy Server in httpd Accelerator
Mode
We have already presented a solution with two servers: one plain Apache server,
which is very light and configured to serve static objects, and the other with mod_
perl enabled (very heavy) and configured to serve mod_perl scripts and handlers. We
named them httpd_docs and httpd_perl, respectively.
In the dual-server setup presented earlier, the two servers coexist at the same IP
address by listening to different ports: httpd_docs listens to port 80 (e.g., http://www.
example.com/images/test.gif) and httpd_perl listens to port 8000 (e.g., http://www.
example.com:8000/perl/test.pl). Note that we did not write http://www.example.com:80
,ch12.24057 Page 413 Thursday, November 18, 2004 12:41 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
414
|
Chapter 12: Server Setup Strategies
for the first example, since port 80 is the default port for the HTTP service. Later on,
we will change the configuration of the httpd_docs server to make it listen to port 81.
This section will attempt to convince you that you should really deploy a proxy
server in httpd accelerator mode. This is a special mode that, in addition to provid-
ing the normal caching mechanism, accelerates your CGI and mod_perl scripts by
taking the responsibility of pushing the produced content to the client, thereby free-
ing your mod_perl processes. Figure 12-3 shows a configuration that uses a proxy
server, a standalone Apache server, and a mod_perl-enabled Apache server.
The advantages of using the proxy server in conjunction with mod_perl are:
• You get all the benefits of the usual use of a proxy server that serves static
objects from the proxy’s cache. You get less I/O activity reading static objects
from the disk (the proxy serves the most “popular” objects from RAM—of
course you benefit more if you allow the proxy server to consume more RAM),
and since you do not wait for the I/O to be completed, you can serve static
objects much faster.
• You get the extra functionality provided by httpd accelerator mode, which makes
the proxy server act as a sort of output buffer for the dynamic content. The
mod_perl server sends the entire response to the proxy and is then free to deal
with other requests. The proxy server is responsible for sending the response to
the browser. This means that if the transfer is over a slow link, the mod_perl
server is not waiting around for the data to move.
Figure 12-3. A proxy server, standalone Apache, and mod_perl-enabled Apache
Clients
Response
Request
Response
Request
Static object
Dynamic object
Proxy port
80
httpd_docs
Apache
example.com:80
httpd_perl
Apache and mod_perl
example.com:8000
,ch12.24057 Page 414 Thursday, November 18, 2004 12:41 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Adding a Proxy Server in httpd Accelerator Mode
|
415
• This technique allows you to hide the details of the server’s implementation.
Users will never see ports in the URLs (more on that topic later). You can have a
few boxes serving the requests and only one serving as a frontend, which spreads
the jobs between the servers in a way that you can control. You can actually shut
down a server without the user even noticing, because the frontend server will
dispatch the jobs to other servers. This is called load balancing—it’s too big an
issue to cover here, but there is plenty of information available on the Internet
(refer to the References section at the end of this chapter).
• For security reasons, using an httpd accelerator (or a proxy in httpd accelerator
mode) is essential because it protects your internal server from being directly
attacked by arbitrary packets. The httpd accelerator and internal server commu-
nicate only expected HTTP requests, and usually only specific URI namespaces
get proxied. For example, you can ensure that only URIs starting with /perl/ will
be proxied to the backend server. Assuming that there are no vulnerabilities that
can be triggered via some resource under /perl, this means that only your public
“bastion” accelerating web server can get hosed in a successful attack—your
backend server will be left intact. Of course, don’t consider your web server to
be impenetrable because it’s accessible only through the proxy. Proxying it
reduces the number of ways a cracker can get to your backend server; it doesn’t
eliminate them all.
Your server will be effectively impenetrable if it listens only on ports on your
localhost (127.0.0.1), which makes it impossible to connect to your backend
machine from the outside. But you don’t need to connect from the outside any-
more, as you will see when you proceed to this technique’s implementation notes.
In addition, if you use some sort of access control, authentication, and authori-
zation at the frontend server, it’s easy to forget that users can still access the
backend server directly, bypassing the frontend protection. By making the back-
end server directly inaccessible you prevent this possibility.
Of course, there are drawbacks. Luckily, these are not functionality drawbacks—
they are more administration hassles. The disadvantages are:
• You have another daemon to worry about, and while proxies are generally sta-
ble, you have to make sure to prepare proper startup and shutdown scripts,
which are run at boot and reboot as appropriate. This is something that you do
once and never come back to again. Also, you might want to set up the crontab
to run a watchdog script that will make sure that the proxy server is running and
restart it if it detects a problem, reporting the problem to the administrator on
the way. Chapter 5 explains how to develop and run such watchdogs.
• Proxy servers can be configured to be light or heavy. The administrator must
decide what gives the highest performance for his application. A proxy server
such as Squid is light in the sense of having only one process serving all requests,
but it can consume a lot of memory when it loads objects into memory for faster
service.
,ch12.24057 Page 415 Thursday, November 18, 2004 12:41 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
416
|
Chapter 12: Server Setup Strategies
• If you use the default logging mechanism for all requests on the front- and back-
end servers, the requests that will be proxied to the backend server will be
logged twice, which makes it tricky to merge the two log files, should you want
to. Therefore, if all accesses to the backend server are done via the frontend
server, it’s the best to turn off logging of the backend server.
If the backend server is also accessed directly, bypassing the frontend server, you
want to log only the requests that don’t go through the frontend server. One way
to tell whether a request was proxied or not is to use mod_proxy_add_forward,
presented later in this chapter, which sets the HTTP header
X-Forwarded-For for
all proxied requests. So if the default logging is turned off, you can add a custom
PerlLogHandler that logs only requests made directly to the backend server.
If you still decide to log proxied requests at the backend server, they might not
contain all the information you need, since instead of the real remote IP of the
user, you will always get the IP of the frontend server. Again, mod_proxy_add_
forward, presented later, provides a solution to this problem.
Let’s look at a real-world scenario that shows the importance of the proxy httpd
accelerator mode for mod_perl.
First let’s explain an abbreviation used in the networking world. If someone claims
to have a 56-kbps connection, it means that the connection is made at 56 kilobits per
second (~56,000 bits/sec). It’s not 56 kilobytes per second, but 7 kilobytes per sec-
ond, because 1 byte equals 8 bits. So don’t let the merchants fool you—your modem
gives you a 7 kilobytes-per-second connection at most, not 56 kilobytes per second,
as one might think.
Another convention used in computer literature is that 10 Kb usually means 10 kilo-
bits and 10 KB means 10 kilobytes. An uppercase B generally refers to bytes, and a
lowercase b refers to bits (K of course means kilo and equals 1,024 or 1,000, depend-
ing on the field in which it’s used). Remember that the latter convention is not fol-
lowed everywhere, so use this knowledge with care.
In the typical scenario (as of this writing), users connect to your site with 56-kbps
modems. This means that the speed of the user’s network link is 56/8 = 7 KB per sec-
ond. Let’s assume an average generated HTML page to be of 42 KB and an average
mod_perl script to generate this response in 0.5 seconds. How many responses could
this script produce during the time it took for the output to be delivered to the user?
A simple calculation reveals pretty scary numbers:
Twelve other dynamic requests could be served at the same time, if we could let
mod_perl do only what it’s best at: generating responses.
This very simple example shows us that we need only one-twelfth the number of
children running, which means that we will need only one-twelfth of the memory.
42KB()0.5s 7KB/s×()⁄ 12=
,ch12.24057 Page 416 Thursday, November 18, 2004 12:41 PM
Không có nhận xét nào:
Đăng nhận xét