Anda di halaman 1dari 82

Workbook 1.

The Apache Web Server

Workbook 1. The Apache Web Server

Table of Contents
1. Webserver Basics ...................................................................................................................................5 Discussion ..........................................................................................................................................5 Web Servers..............................................................................................................................5 Installation the Apache Web Server .........................................................................................5 Web Server Layout ...................................................................................................................6 The Document Root: /var/www/html/ .................................................................................7 Content Types ...........................................................................................................................8 Directories ................................................................................................................................9 Web Server Logging: /var/log/httpd/{access,error}_log......................................10 The Anatomy of a Web Request: the HTTP Protocol (Optional, but Interesting) .................12 The Hyper Text Markup Language (HTML) (Optional)........................................................17 Exercises ..........................................................................................................................................18 Specication ...........................................................................................................................18 Deliverables ............................................................................................................................19 Clean Up .................................................................................................................................19 Questions..........................................................................................................................................20 2. Apache Conguration..........................................................................................................................24 Discussion ........................................................................................................................................24 Apache Conguration: /etc/httpd/conf/httpd.conf ..................................................24 The Global Section .................................................................................................................25 The Main Section ...................................................................................................................30 The Answer Book: http://localhost/manual ...............................................................35 Exercises ..........................................................................................................................................36 Specication ...........................................................................................................................36 Deliverables ............................................................................................................................37 Questions..........................................................................................................................................37 3. Apache Conguration: Containers ....................................................................................................41 Discussion ........................................................................................................................................41 Tailoring Customization to Particular Content: Containers ...................................................41 Common Container Conguration .........................................................................................42 Red Hat Enterprise Linux Default Conguration...................................................................46 Location Containers: server-status and server-info ................................................................48 Exercises ..........................................................................................................................................50 Specication ...........................................................................................................................50 Deliverables ............................................................................................................................52 Questions..........................................................................................................................................52 4. Virtual Hosts ........................................................................................................................................57 Discussion ........................................................................................................................................57 Virtual Hosts...........................................................................................................................57 IP Based Virtual Hosting ........................................................................................................57 Name Based Virtual Hosts......................................................................................................58 Exercises ..........................................................................................................................................59 Specication ...........................................................................................................................59 Deliverables ............................................................................................................................62 Questions..........................................................................................................................................62

iii

5. The Squid Proxy Server ......................................................................................................................67 Discussion ........................................................................................................................................67 Proxy Servers..........................................................................................................................67 The squid Proxy Server..........................................................................................................68 Squid Conguration: /etc/squid/squid.conf ................................................................68 The servers identity: http_port ..........................................................................................69 Squid Access Control Lists: acl and http_access ............................................................69 Conguring Proxies for Web Clients......................................................................................73 Squid Logging: /var/log/squid/access.log ................................................................75 Finding Out More ...................................................................................................................76 Exercises ..........................................................................................................................................76 Specication ...........................................................................................................................76 Deliverables ............................................................................................................................78 Challenge Exercises................................................................................................................78 Questions..........................................................................................................................................78

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

iv

Chapter 1. Webserver Basics


Key Concepts

The web server that ships with Red Hat Enterprise Linux is the Apache webserver. In general terms, web servers map URL requests onto les within the local directory, using the Document Root (/var/www/html/) as the base of the translation. The web server associates meta-data with requested les, such as content types. When a client requests a directory instead of a le, Apache serves the le index.html (if it exists), generates a dynamically generated directory listing (if its allowed to), or returns an access denied error. Web servers and web clients communicate using the HTTP protocol. Often, the information served from a web server is structured using the HTML markup language.

Table 1-1. The Apache Web Server Packages Service Daemon Cong Files Logging Ports
httpd (with apr and httpd-suexec dependencies), plus other modules (usually starting mod_...), and httpd-manual. httpd

/usr/sbin/httpd
/etc/httpd/conf/httpd.conf, /etc/httpd/conf.d/* /var/log/httpd/{access,error}_log

80/tcp (http), 443/tcp (https)

Discussion
Web Servers
This lesson focuses on installing and starting the Apace web server, and publishing information using the default conguration. We also introduce some of the basics of the HTTP protocol and the HTML markup language, for those who are interested.

Installation the Apache Web Server


In Red Hat Enterprise Linux, the Apache web server is easy to install and start in its default conguration, using the conventional trio of commands to install the httpd package and start the httpd

Chapter 1. Webserver Basics service: yum install ...; service ... start; chkconfig ... on.
[root@station ~]# yum install httpd

... Dependencies Resolved ============================================================================= Package Arch Version Repository Size ============================================================================= Installing: httpd i386 2.2.3-6.el5 rha-rhel 1.1 M ... Installed: httpd.i386 0:2.2.3-6.el5 Complete!

The httpd service can now be started and "chkconged on".


[root@station ~]$ service httpd start

Starting httpd:
[root@station ~]$ chkconfig httpd on

OK

The availability of the Web Server can be conrmed by using any Web browser to reference http://localhost. The following example uses elinks, but the refox browser could have been used just as easily.
[root@station ~]$ elinks -dump http://localhost

Red Hat Enterprise Linux Test Page This page is used to test the proper operation of the Apache HTTP server after it has been installed. If you can read this page, it means that the Apache HTTP server installed at this site is working properly. ...

Web Server Layout


Once installed, a rpm query to list les (rpm -ql) always serves as a good introduction to the layout of a new product.
[root@station ~]$ rpm -ql httpd

/etc/httpd /etc/httpd/conf /etc/httpd/conf.d /etc/httpd/conf.d/README ...

Skimming the output, the following relevant les and directories could be seen. Table 1-2. Web Server Filesystem Layout

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics Directory


/etc/httpd/

Purpose Conguration les, including /etc/httpd/conf/httpd.conf. Dynamically loaded modules.

/usr/lib/httpd/modules/ /var/log/httpd/ /var/www/html/

Log les, including access_log and error_log. The Web Server Document Root (more on this in a moment).

The Document Root: /var/www/html/


The purpose of the Web Server is to serve information. Usually, this involves reading a le from the le system and transferring it to a web browser, which then displays or renders the le. As an arbitrary example, the le /etc/sysctl.conf can be copied to the document root (/var/www/html) directory. Any web browser referencing http://localhost/sysctl.conf should display the contents of the le just as could be done with the cat command. (Some web browsers may mangle the whitespace within the le, essentially placing the entire contents of the le on one line. This issue arises because of misguided "Content Type" negotiations. More on this later.)
[root@station ~]$ cp /etc/sysctl.conf /var/www/html/ [root@station ~]$ elinks http://localhost/sysctl.conf [root@station ~]$ elinks -source http://localhost/sysctl.conf

# Kernel sysctl configuration file for Red Hat Linux # # For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and # sysctl.conf(5) for more details. # Controls IP packet forwarding net.ipv4.ip_forward = 0 ...

Instead of a single le, entire directory trees can be copied into the /var/www/html directory.
[root@station ~]$ cp -a /etc/sysconfig /var/www/html/

Now, by accessing http://localhost/syscong with a web browser, the contents of the directory should be visible, with "clickable" le and subdirectory links.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics Figure 1-1. Browsing the sysconfig Directory

Notice the shift in perspective. What we would call the directory /var/www/html/sysconfig, the web server refers to as just /sysconfig. This translation is the essence of the term "Document Root". Web browsers request information using "Uniform Resource Locators", or more commonly just "URL"s. Web related URLs are usually composed of a hostname and a le path.
http://hostname/dir1/dir2/filename

The hostname is simply the hostname or IP address of the host running the server, while the dir1/dir2/filename is thought of as being a path to a particular le on the server. When locating the le, the web server assumes that the root of the "URL Namespace" is the document root directory (/var/www/html). The http portion of the URL is the protocol, which tells the web browser both which port to connect to, and what "language" to expect to speak to whomever is listening on that port. For web servers, the port is 80, and the language is known as the Hypertext Transfer Protocol, or HTTP. Of course, its not a machines conguration les that one usually chooses to publish to the world. Well move on to more interesting content.

Content Types
The purpose of the web server is to serve the content of les, but web clients seem to learn not just the content of the le, but how to interpret the content, as well. As an example, consider a text le such as /etc/hosts, an HTML le such as /usr/share/doc/samba-version/htmldocs/manpages/net.8.html, and an image le, such as /usr/share/backgrounds/tiles/neurons.png, each of which are copied to a web servers document root.
[root@station [root@station [root@station [root@station [root@station ~]# mkdir /var/www/html/example ~]# cd /var/www/html/example example]# cp /etc/hosts . example]# cp /usr/share/doc/samba-*/htmldocs/manpages/net.8.html . example]# cp /usr/share/backgrounds/tiles/neurons.png .

rha230-5.0-1-en-2008-01-21T07:12:18-0500

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics


[root@station example]# ls

hosts

net.8.html

neurons.png

How does a web client handle each of these? If youre sitting at a student workstation, try for yourself. (Of course, you will rst need to perform the above commands to put the les in place.) http://localhost/example/hosts http://localhost/example/net.8.html http://localhost/example/neurons.png
Note: Make sure to create or copy les underneath the /var/www/html directory as the root user. Do not move already existing les into the directory. If youre having trouble, give it a pass for now, until you read the section "But What Could Go Wrong?" below.

All of the les should have been treated reasonably by the client: the hosts le as a simple text le, the net.8.html le as a marked up man page, complete with bolded titles, italics, and hyperlinks, and neuron.png as a picture of blue blobs. Now lets shake things up a bit.
[root@station [root@station [root@station [root@station example]# example]# example]# example]# cp cp cp cp hosts hosts.html net.8.html net.8.txt hosts hosts.png neurons.png neurons.txt

Again, if at a student workstation, try the following. http://localhost/example/hosts.html http://localhost/example/net.8.txt http://localhost/example/hosts.png http://localhost/example/neurons.txt For those not able to follow along, hosts.html lost all of its formatting, net.8.txt dumped what you would see if you catted the le directly, hosts.png caused the browser to complain about a malformed image, and neurons.txt showed a bunch of glyphs representing binary data. Theres obviously some expectations on the part of the browser about how to interpret the data it receives: text to dump, marked up text (html) to format, or an image to render. The expectation about what type of data the client is receiving is known as the datas content type. Apparently, the content type is determined by the les lename extension. We still dont know if the extension is being interpreted into a content type by the server (before the les content is transmitted) or by the client (after the content is received). The answer is the server, and the server communicates that content type, as well as a lot of other meta-data about the transfer, using the HTTP protocol.

Directories
Weve seen how the web server responds when the web server requests a le: it returns the contents of the le to the client. How does the web server handle directories? In general, a webserver responds in one of three ways.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics First, the web server checks to see if an index le (a le named index.html) exists in the directory. If so, the webserver returns the contents of the le, as if the request for http://localhost/example were for http://localhost/example/index.html. Secondly, if no index le exists, the web server checks to see if the Indexes option is enabled. If so, the web server returns a dynamically generated directory listing. Otherwise, the webserver returns an error to the client. (How the Indexes option is set or not set will be covered in a following lesson. In Red Hat Enterprise Linux, the option is set by default.) Table 1-3. Web Server Responses to Directory Requests Conguration
index.html exists

Response Return the contents of index.html Return a dynamically generated directory listing Return error 403 ("Access Denied")

no index.html, Indexes enabled no index.html, Indexes disabled

Assuming you followed along above, create the le /var/www/html/example/index.html with the following content (you should be able to cut and paste directly from the browser).
<h1>Examples</h1> [<a href="hosts">hosts</a>] [<a href="net.8.html">net man page</a>] [<a href="neurons.png">picture of neurons</a>]

What happens when you now view http://localhost/example? You should see the marked up contents of the index le. Is the effect any different if you view http://localhost/example/index.html directly? (It shouldnt be.) Figure 1-2. Contents of http://localhost/example

What about the le /var/www/html/hosts.html? Is it still available? You should be able to access it by manually entering the URL http://localhost/example/hosts.html, but there is no way to click to it directly (except from this page, of course). Content behind an index le, which is not referenced directly, is obscured, but still available if someone knows its there.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

10

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

Web Server Logging: /var/log/httpd/{access,error}_log


The Apache web server logs information about every request it handles to the le /var/log/httpd/access_log. A sample of the log les contents follows.
[root@station ~]# tail -3 /var/log/httpd/access_log

127.0.0.1 - - [13/Jul/2005:06:34:24 -0400] "GET /example/net.8.html HTTP/1.1" 20 0 26196 "http://localhost/rhasb/curr/rha230/html-instructor-classroom/rha230_htt pd_http.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6" 127.0.0.1 - - [13/Jul/2005:06:34:24 -0400] "GET /example/samba.css HTTP/1.1" 404 290 "http://localhost/example/net.8.html" "Mozilla/5.0 (X11; U; Linux i686; enUS; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6" 127.0.0.1- - [13/Jul/2005:06:34:25 -0400]"GET /favicon.ico HTTP/1.1" 404284" -" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0 .6-1.1.fc4 Firefox/1.0.6"

Amongst any line, we nd the following information. The IP address of the client who made the request. A timestamp of when the request occurred. The response code associated with the request. A response of code of 200 implies success, anything else is usually some type of failure. The length of the content returned, not to be confused with the response code which proceeds it.

Any request that does not complete successfully (i.e., whose response code is not 200) also generates information in the error_log.
[root@station ~]# tail -3 /var/log/httpd/error_log

[Tue Jul 13 06:34:24 2005] [error] [client 127.0.0.1] File does not exist: /var/ www/html/example/samba.css, referer: http://localhost/example/net.8.html [Tue Jul 13 06:34:25 2005] [error] [client 127.0.0.1] File does not exist: /var/ www/html/favicon.ico

The access_log and the error_log are one of the rst places an administrator should look when trying to gure out why something doesnt seem to be working. The following table itemizes some of the return codes associated with various errors (or successes). Table 1-4. HTTP return codes Code 200 301 403 404 501 Meaning Success Authorization Required Access Denied File Not Found Internal Server Error

There are many others, but these tend to be the most common. (In general, the HTTP protocol follows an response code convention used by many network services: partial success are in the 100s, successes in the 200s, incomplete transactions in the 300s, client errors in the 400s, and server errors in the 500s.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

11

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics Watch closely the output the next time you use the simple ftp client, for example.)

But What Could Go Wrong?


In its default conguration, theres really only two things that could cause problems: permissions, and SELinux. First, les must be readable by the system user apache. The httpd process, like any other process, must have the right permissions to access a le. For security reasons, the web server runs as the user apache. Therefore, any le served by the web server must be readable by the user apache. Secondly, the Apache web server is one of the services constrained by the Red Hat Enterprise Linux SELinux targeted policy. Therefore, any le serviced by the Apache web server must have an appropriate SELinux context. For now, the context of the /var/www/html directory (httpd_sys_content_t) will sufce. Any le created in this directory (including subdirectories) should inherit this context, and be ne. The problem occurs when les are created somewhere else, and moved to this directory - they then retain their original (inappropriate) SELinux context. At any rate, whenever the web server complains in its log le that it cannot access a le you think it should be able to, try the following commands to set appropriate permissions and SELinux context.
[root@station ~]# chmod a+r filename [root@station ~]# chcon --reference /var/www/html filename

or
[root@station ~]# restorecon /var/www/html/filename

The Anatomy of a Web Request: the HTTP Protocol (Optional, but Interesting)
This section introduces the HTTP protocol. The intent is not to be thorough, but instead to give students an impression of what is meant when people use terms such as HTTP headers, GET , and Response Code. For those who dont get enough, all of the details can be found at the World Wide Web Consortiums (http://www.w3.org) website (http://www.w3.org/Protocols). In order to introduce the HTTP protocol, its easiest to start with an example. The entire conversation between a web client and a web server can be captured using the wireshark network analyzer. If not already installed, yum install wireshark-gnome should do the trick. A capture is started by opening wireshark, choosing Capture:Start... from the menu, specifying a capture lter of (in this case) port 80, and "OK"ing. (Enabling "Update list of packets in real time" and "Automatic scroll in live capture" tends to make things more interesting for small captures, as well.)

rha230-5.0-1-en-2008-01-21T07:12:18-0500

12

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics Figure 1-3. Specifying a Wireshark Capture lter

Once Wireshark is capturing packets, any conversations between a web client and a web server which occur on the local machine should be captured. For example, the following displays a conversation between a web client requesting http://station53.rosemont.wlan/example/hosts and a web server providing the answer. Once wireshark has been stopped, the individual IP packets can be browsed from a list.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

13

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics Figure 1-4. A Wireshark Capture Packet List

More interestingly for our purposes, wireshark can easily assemble the payload from each of the individual packets which compose a TCP/IP conversation by right clicking on any packet, and choosing Follow TCP Stream.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

14

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics Figure 1-5. Viewing a TCP Conversation with Wireshark

The web client, in red, is making a request of the web server, in blue. The "language" the client and server use is the HTTP protocol.

The HTTP Protocol: the Request (Client to Server)


A web request is composed of three parts: a request line, a series of HTTP headers, and the "body" (or content).
Note: In the following, some portions of the text have been replaced with "..." for readability. The same convention is used many places in the text.

GET/example/hostsHTTP/1.1 Host: station53.rosemont.wlan User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Geck... Accept: text/xml,...text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive

rha230-5.0-1-en-2008-01-21T07:12:18-0500

15

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics The entire rst line is known as the Request-Line, and contains exactly three pieces of information in a specied order. The request method, which for our purposes can be thought of either being a GET or a POST . With a GET , the client is requesting information. With a POST , the client is submitting information. The URI, or "Universal Resource Identier". Think of this as the path portion of a URL. (The server portion has already been used to open the TCP/IP connection.) The exact protocol that the client is speaking. Only two protocols are generally considered, HTTP/1.0 and HTTP/1.1, and any modern client should be using the latter.

The next series of lines, which all have the form header: data, are known as the HTTP headers. These are used to associate any metadata with the request. Some HTTP request headers relevant to our discussion are the following. Host: The content of the host portion of the URL requested by the client. User-Agent: The User Agent is the client software. In this case, the client is the Firefox web browser, which identies itself as a variant of Mozilla. Accept: A list of the content types that the browser is willing to accept. This browser prefers to receive text/xml or text/html, but will also handle text/plain. For images, the browser prefers image/png, but in the end, the browser will accept */*, or anything the server will throw at it.

After a blank line, indicating the end of the HTTP headers, the content of the request would follow. For GET requests, such as this one, there is no content.

The HTTP Protocol: the Response (Server to Client)


The server responds with the following, which is again composed of three parts: a response line, a set of response HTTP headers, and the response "body" (or content).
HTTP/1.1200OK Date: Sat, 13 Aug 2005 11:09:51 GMT Server: Apache/2.0.54 (Fedora) Last-Modified: Sat, 13 Aug 2005 10:26:31 GMT ETag: "406ee-104-105723c0" Accept-Ranges: bytes Content-Length: 260 Connection: close Content-Type: text/plain; charset=UTF-8 # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1.localhost.localdomain.localhost 192.168.218.254.rosemont. #192.168.218.254.s. #192.168.218.53.w. 192.168.0.5.s. 192.168.0.6.w. 192.168.201.254 rw

rha230-5.0-1-en-2008-01-21T07:12:18-0500

16

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics The Response-Line, like the Request-Line, is composed of three ordered parts. In the case of the response, however, the latter two elds are redundant. The exact protocol the server is using. The response code of the transaction, which is used to imply success, or qualify a type of failure. In this case, the response code 200 implies success. (More on these later.) A text representation of the response code. This is supplied only for diagnostic (debugging) purposes, as the response code is whats really important. The text OK is associated with the response code of 200.

Again, the next series of lines, which all have the form header: data, are known as the HTTP headers. We will only focus on one of the HTTP response headers. Content-Type: The server is providing the client with the type of the content, so the browser can render the data appropriately. For this response, the content type is text/plain, so the browser will display the content "as is", preserving whitespace. Other content types could include image/png, text/html, or application/msword.

After a blank line, indicating the end of the HTTP headers, the content of the response follows. For this response, the content is a simple text le. (In the output above, tabs have been replaced with periods, an artifact of how wireshark displays non-printing characters.)

The Hyper Text Markup Language (HTML) (Optional)


This workbook is about managing the Apache webserver as a system administrator, not about designing web content. However, during this workbook you will encounter les which use HTML to markup their content, so a brief introduction will be useful. Again, those who do not get enough can nd more at the World Wide Web Consortiums (http://www.w3.org) website (http://www.w3.org/MarkUp). Fundamentally, HTML provides three things. 1. Structure: HTML allows text to be identied as titles or inlined quotes, or organized into lists and tables. 2. Embedded Media: HTML allows authors to embed media into their text, usually in the form of images, but also as videos and sound. 3. Links: HTML allows authors to easily reference other information, so that anyone reading the text can locate the other information with the click of a mouse. All three of the above capabilities rely on embedding HTML tags into the text, where a tag is any text embedded between brackets, such as <table>, <img>, or <a>. Because the brackets are now considered syntax, there needs to be some way to represent the bracket. This is done using HTML entities, which begin with an ampersand (&) and end with a semicolon. For example, the entity for a left bracket is &lt; (for "less than"), and the entity for a right bracket is &gt; (for "greater than"). Entities are also used for glyphs not often found on keyboards, such as the copyright symbol. Since the ampersand starts entities, there must also be some way of representing it, and the answer is itself an entity: &amp;.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

17

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics Rather than provide a full introduction to HTML in the text, a sample document is provided at http://rha-server/pub/rha/rha230/sample.html. Students are encouraged to examine this document, both as it is rendered by a web browser and the underlying text (which can usually be viewed in a browser by right clicking and choosing view page source).

Exercises
Lab Exercise
Objective: Install, start, and contribute content to an Apache web site. Estimated Time: 45 mins.

This exercise has you download and install material for your web server, using the web servers default conguration. The material consists of three texts which are not optimally organized for the Apache web server. The lab has you perform some simple renamings and repositioning of the material so that it is more naturally viewed using a web browser.

Specication
1. If the httpd package is not already installed on your machine, install it now. 2. Start the httpd service (if it is not already started), and congure the service to be started by default upon reboots. 3. Download a copy of the le http://rha-server/pub/rha/rha230/readings.tgz, and extract its contents into your web servers document root directory (/var/www/html/). Properly extracting the contents should result in a new /var/www/html/readings directory. 1 4. Using a web browser, browse the http://localhost/readings directory. You should be able to view the HTML les the_god_of_mars.html and war_of_the_worlds appropriately. 5. Correct a misnamed index le. a. Again using a web browser, examine the contents of the http://localhost/readings/relat10h/ subdirectory. You should discover the le index.htm. Try examining this le through the web browser: http://localhost/readings/relat10h/index.htm. b. Apparently, the intent of the author was that this page should serve as an index page, but the le is named incorrectly for Apaches default conguration. In the /var/www/html/readings/relat10h/ directory, create a link of index.htm named index.html (either hard or soft). c. Using a browser, again view the URL http://localhost/readings/relat10h/. You should now see the contents of the index page. d. To make life a little easier for anyone browsing your site, in the /var/www/html/readings directory, create a symlink to the relat10h directory called relativity.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

18

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics e. Conrm that you may now access the content of the le index.htm using http://localhost/readings/relativity/. 6. Correct a misnamed directory. a. If you can stomach the physics (and, in fact, even if you cannot), skim the rst appendix to Einsteins theory of relativity, either by following the link from the main page, or by referencing http://localhost/readings/relativity/ap01.htm directly. b. You might notice that many of the equations, such as equation 29, equation 30, etc., are missing. Examine the end of /var/log/httpd/access_log, and note the many requested images les which received a 404 ("File Not Found") response code. c. Examine the end of the le /var/log/httpd/error_log, and you will discover more helpful messages.
[root@station ~]# tail /var/log/httpd/error_log

[Tue Jul 20 16:53:14 2005] [error] [client 127.0.0.1] File does not exist: /var/ www/html/readings/relat10h/pics, referer: http://localhost/readings/relat10h /ap01.htm ...

d. Examining the log messages closely, you may discover the problem. All of the web pages are expecting images to be in a directory named pics, but this directory does not exist. e. Through a simple directory renaming, or perhaps another symlink, solve the problem, so that all of the images of equations are properly displayed.

7. Now that you have completed the hard work, relax a little, by deriving the equation for the Lorentz transformation, following the steps in chapter 11. Place your results in a le titled that_was_easy in your academy users home directory. (Just kidding.)

Deliverables

1. An installed and running httpd service, congured to start by default on bootup. 2. The text of three books, browsable from the URL http://localhost/readings. 3. The table of contents of Einsteins theory of relativity at http://localhost/readings/relat10h. 4. The table of contents of Einsteins theory of relativity, also at http://localhost/readings/relativity. 5. The images of equations in appendix 1 (found at http://localhost/readings/relativity/ap01.htm) are displayed properly.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

19

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

Clean Up
You will want to leave the /var/www/html/readings directory in place, as you will need it in the next section.

Questions
1. In Red Hat Enterprise Linux 5, which of the following packages provides the Apache web server? ( ) a. httpd ( ) b. apache ( ) c. webserver ( ) d. apr ( ) e. None of the above

2. Which of the following directories serves as the web servers document root? ( ) a. /opt/docroot ( ) b. /var/pub/ ( ) c. /var/www/html/ ( ) d. /etc/httpd ( ) e. None of the above After migrating the contents of a web site from one operating system to another, web clients, when viewing the URL http://localhost/zsh.txt, are displaying raw html instead of a formatted page:

rha230-5.0-1-en-2008-01-21T07:12:18-0500

20

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

3. What is the simplest solution to the problem? ( ) a. Install the mod_html package. ( ) b. Create a index.html le to reference this page. ( ) c. Use the txt2html utility to assign the le the HTML le type. ( ) d. Rename the le zsh.html. ( ) e. Use chcon to assign the le the appropriate SELinux context. Use the output of the following command to answer the next question, assuming the default Red Hat Enterprise Linux conguration of the Apache web server.
[root@station1 ~]# ls /usr/share/backgrounds/*

/usr/share/backgrounds/images: default.png ladybugs.jpg dewdop_leaf.jpg leafdrops.jpg ... /usr/share/backgrounds/tiles: 3dgreen.png dunes.png All-Good-People-1.jpg fibers.png ...

riverstreet_rail.jpg sneaking_branch.jpg

Planning-And-Probing-1.jpg plasma.png

[root@station ~]# cp -a /usr/share/backgrounds/ /var/www/html/

4. What would you expect to see if you pointed the Firefox web browser to the URL http://localhost/backgrounds/images/? ( ) a. A dynamically generated web page which displays the images as pictures. ( ) b. A "404: File not Found" error page. ( ) c. A "403: Forbidden" error page. ( ) d. A page containing binary data, because the web server tries to interpret the directory as if it were a le.

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

21

Chapter 1. Webserver Basics ( ) e. A dynamically generated web page which lists the contents of the directory by lename.

5. If, when the directory above is referenced, you would prefer web clients to see the contents of a le, what should the relevant le be named? ( ) a. README.html ( ) b. HEADER.html ( ) c. index.htm ( ) d. DIR.htm ( ) e. None of the above

6. In what le are all web requests from clients ("hits") logged? ( ) a. /var/log/secure ( ) b. /var/log/httpd/error_log ( ) c. /var/log/messages ( ) d. /var/log/httpd/access_log ( ) e. Both C and D

7. If, when running service httpd start, the webserver fails to start, what le might contain helpful debugging messages? ( ) a. /var/log/secure ( ) b. /var/log/xferlog ( ) c. /var/log/httpd/error_log ( ) d. /var/log/httpd/access_log ( ) e. Both B and D

8. In what le are web requests that generate errors logged? ( ) a. /var/log/secure ( ) b. /var/log/httpd/error_log ( ) c. /var/log/messages ( ) d. /var/log/httpd/access_log ( ) e. Both B and D

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

22

Chapter 1. Webserver Basics 9. Which is the web servers "well known" port? ( ) a. 8080 ( ) b. 22 ( ) c. 25 ( ) d. 80

10. Apaches dynamically loaded modules are conventionally found in what directory? ( ) a. /usr/lib/httpd/modules ( ) b. /usr/lib/apache ( ) c. /usr/libexec/apache ( ) d. /usr/share/httpd/modules ( ) e. None of the above

Notes
1. An excellent source for public domain texts it the Gutenberg project (http://www.gutenberg.org).

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

23

Chapter 2. Apache Conguration


Key Concepts

The Apache server is congured using the /etc/httpd/conf/httpd.conf and /etc/httpd/conf.d/*.conf conguration les. The conguration le is informally divided into the Global, Main, and Virtual Server sections. The Global section denes aspects which pertain to the server as a whole, including client connection dynamics, server pool parameters, binding address, and which modules to load. The Main section denes aspects which may be redened by any virtual server, such as the document root, logging behavior, and URL namespace remappings. Comprehensive documentation is provided by the httpd-manual package, which, when installed, can be access at http://localhost/manual.

Discussion
Apache Conguration: /etc/httpd/conf/httpd.conf
The Apache web server is congured with text conguration les which are read upon startup. The primary conguration le is /etc/httpd/conf/httpd.conf, but the les /etc/httpd/conf.d/*.conf are "slurped up" into the conguration, as well.
[root@station ~]# ls /etc/httpd/conf /etc/httpd/conf.d/

/etc/httpd/conf: httpd.conf magic /etc/httpd/conf.d/: README welcome.conf

The apache conguration le syntax is straightforward, and tends to be well documented (both as comments in the default conguration le, and in a separate manual to be discussed later). A sample of the conguration les syntax follows.
# # DocumentRoot: The directory out of which you will serve your # documents. By default, all requests are taken from this directory, but # symbolic links and aliases may be used to point to other locations. # DocumentRoot "/var/www/html" # # Each directory to which Apache has access can be configured with respect # to which services and features are allowed and/or disabled in that

24

Chapter 2. Apache Conguration


# directory (and its subdirectories). # ... <Directory /> Options FollowSymLinks AllowOverride None </Directory>

Any empty line, or line which begins with a hash ("#"), is considered a comment. Any line which is not a comment generally starts with a keyword referred to as a directive. Directives are not case sensitive, but of course spelling is important. The syntax of the remainder of the line depends on the directive, but all of a directives arguments must occur on a single line. The only other way a line can begin is with a XML-like tag, which begins a container. Containers end with a XML-like closing tag. Generally, all directives found within a container only take effect within the scope of the container. We will discuss the effects of different types of containers in a later lesson.

The le is thought of as occurring in three sections, although the syntax does not formally enforce them. 1. The Global Section: This section contains conguration which applies to the web server as a whole, including any virtual servers. 2. The Main Section: Conguration which applies to the main server (as opposed to any virtual servers) belongs in this section. Any conguration in this section can be overridden by a virtual server. 3. Virtual Servers: The Apache web server can take on the appearance of being multiple distinct servers. Virtual servers will be discussed in more detail in the next lesson. We begin by examining conguration relevant to the server as a whole. You might want to open the le /etc/httpd/conf/httpd.conf in a pager or text editor and follow along as you read the following sections. (You should consider setting the editor into a "read only" mode, or making a backup of the le and browsing it).

The Global Section


The Global section of the conguration le includes conguration that effects the server as a whole. Figure 2-1. /etc/httpd/conf/httpd.conf
### Section 1: Global Environment # 35 # The directives in this section affect the overall operation of Apache, # such as the number of concurrent requests it can handle or where it # can find its configuration files.

Conguration Context: ServerRoot


The ServerRoot directive establishes a home base for all of the remaining server context, while the second directive is a simple example of making use of this home base.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

25

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration Figure 2-2. /etc/httpd/conf/httpd.conf


46 # # ServerRoot: The top of the directory tree under which the servers # configuration, error, and log files are kept. ... # 55 # Do NOT add a slash at the end of the directory path. # ServerRoot "/etc/httpd" # 60 # PidFile: The file in which the server should record its process # identification number when it starts. # PidFile run/httpd.pid

The ServerRoot directive establishes context for future le references within the conguration le. Any relative le reference (one that does not begin with a "/") will be relative to the ServerRoot, which in Red Hat Enterprise Linux is /etc/httpd. In Unix, daemons traditionally record the fact that they are running by creating a le in the lesystem which contains their process id, called a pid le. The PidFile directive species where this le should be located.

Examining the /etc/httpd directory, we nd its populated with several symbolic links.
[root@station ~]$ ls -l /etc/httpd

total 28 drwxr-xr-x drwxr-xr-x lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx

4 2 1 1 1

root root root root root

root 4096 Jul 25 06:33 conf root 4096 Jul 25 06:33 conf.d root 19 Jul 25 06:33 logs -> ../../var/log/httpd root 27 Jul 25 06:33 modules -> ../../usr/lib/httpd/modules root 13 Jul 25 06:33 run -> ../../var/run

In the httpd.conf conguration le, le references that begin logs/, modules/, or run/ are mapped to the relevant directories. Can you convince yourself that the daemons pid le would be found at /var/run/httpd.pid? Its important to understand the role of the ServerRoot directive, and the use of the symbolic links in the /etc/httpd directory, but theres seldom any reason to change these values.

Client Connection Dynamics: Timeout and KeepAlive


The following directives control how long the server will wait on badly behaved clients. Figure 2-3. /etc/httpd/conf/httpd.conf
65 # # Timeout: The number of seconds before receives and sends time out. # Timeout 120

rha230-5.0-1-en-2008-01-21T07:12:18-0500

26

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration

70 # # KeepAlive: Whether or not to allow persistent connections (more than # one request per connection). Set to "Off" to deactivate. # KeepAlive Off 75 # # MaxKeepAliveRequests: The maximum number of requests to allow # during a persistent connection. Set to 0 to allow an unlimited amount. # We recommend you leave this number high, for maximum performance. 80 # MaxKeepAliveRequests 100 # # KeepAliveTimeout: Number of seconds to wait for the next request from the 85 # same client on the same connection. # KeepAliveTimeout 15

A particular httpd process can only communicate with one client at a time. A badly behaved client, which opens a TCP/IP connection but never uses it, could therefore tie up a server indenitely. The Timeout directive species how long, in seconds, before a server terminates a connection with a badly behaved client.

These directives decide if the server honors "Keep Alive" requests from a client, how many request can be made over a "Keep Alive" connection, and how long before an inactive connection should time out. The HTTP protocol is termed a "stateless" protocol, meaning that the server doesnt record any information about the client between one request and the next. In the original HTTP/1.0 protocol, clients are required to open a new socket for every request. Downloading a web page with 10 images, therefore, would require the client to open 11 sockets (one for the page, and one for each referenced image). The HTTP/1.1 protocol tried to improve efciency by allowing a client to leave a single socket open for "follow up" requests. Such a persistent socket is called a "Keep Alive" socket. Clients are more likely to abuse such persistent connections, however, by leaving them open but not making any followup requests, so stricter timeout values are usually assigned to such connections.

Managing the Server Pool: StartServers, {Min,Max}SpareServers, MaxClients, and MaxRequestsPerChild


Recall that most Unix daemons use a forking model. Upon receiving a new client connection, the server process forks (duplicates itself), dedicating the new child to the newly connected client, while the parent returns to listening for new connections. In order to gain efciency, the Apache web server takes the uncommon approach of "pre-forking" child daemons to handle client connections, before the clients ever arrive. Even on an unused web server, several httpd processes exist. The parent daemon is generally run as the user root, and the pre-forked child daemons as the user apache. The collection of httpd process are often referred to as the "server pool".

rha230-5.0-1-en-2008-01-21T07:12:18-0500

27

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration


[root@station ~]# ps aux | grep httpd

root apache apache apache root

2334 2359 2360 5248 7636

0.0 0.0 0.0 0.0 0.0

2.0 2.0 2.0 2.0 0.1

19504 19504 19504 19504 3768

10488 10624 10624 10628 716

? ? ? ? pts/5

Ss S S S S+

05:57 05:57 05:57 07:04 08:56

0:00 0:00 0:00 0:00 0:00

/usr/sbin/httpd /usr/sbin/httpd /usr/sbin/httpd /usr/sbin/httpd grep httpd

The following directive manage the dynamics of the server pool. Figure 2-4. /etc/httpd/conf/httpd.conf
# prefork MPM # StartServers: number of server processes to start 95 # MinSpareServers: minimum number of server processes which are kept spare # MaxSpareServers: maximum number of server processes which are kept spare # ServerLimit: maximum value for MaxClients for the lifetime of the server # MaxClients: maximum number of server processes allowed to start # MaxRequestsPerChild: maximum number of requests a server process serves 100 <IfModule prefork.c> StartServers 8 MinSpareServers 5 MaxSpareServers 20 ServerLimit 256 105 MaxClients 256 MaxRequestsPerChild 4000 </IfModule>

StartServers: The initial size of the server pool (in number of processes).

{Min,Max}SpareServers: The server pool scales dynamically. If a web server gets blitzed with many requests, more child daemons will be started. If things go quiet, unused child daemons will be killed. These directives place bounds on the server pool size. ServerLimit, MaxClients: The number of concurrent requests can be limited. Connection request above this limit will be greeted with a quick "Im busy... come back later", rather than actually handled. The distinction between the ServerLimit and MaxClients directives is subtle, and in practice they are set together to the same value. MaxRequestsPerChild: In order to improve stability, a given child daemon will only serve so many requests until it kills itself, and a new daemon must be started. (This suicide helps curtail memory leaks in poorly written libraries and CGI executables.)

Controlling the Server Address: Listen


Figure 2-5. /etc/httpd/conf/httpd.conf
125 # # Listen: Allows you to bind Apache to specific IP addresses and/or # ports, in addition to the default. See also the <VirtualHost> # directive. #

rha230-5.0-1-en-2008-01-21T07:12:18-0500

28

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration


130 # Change this to Listen on specific IP addresses as shown below to # prevent Apache from glomming onto all bound IP addresses (0.0.0.0) # #Listen 12.34.56.78:80 Listen 80

The Listen directive controls which address the server binds to. In the default conguration (above), the server binds to internal IP address 0.0.0.0 (implying every active interface), port 80. Multiple Listen lines can be used to specify that the daemon should bind to multiple ports and/or addresses.

Extending the Web Server: LoadModule


The Apache web server is modular by design. The core web server is actually fairly minimal, with various modules providing much of the interesting behavior. Modules may either be "static", meaning that theyre part of the core executable and can never be removed, or "dynamic", meaning that an administrator can control if the module is loaded or not during startup. Apache dynamic modules are located in the /usr/lib/httpd/modules, and are loaded using the LoadModule directive. Figure 2-6. /etc/httpd/conf/httpd.conf
136 # # Dynamic Shared Object (DSO) Support # # To be able to use the functionality of a module which was built as a DSO you 140 # have to place corresponding LoadModule lines at this location so the # directives contained in it are actually available _before_ they are used. # Statically compiled modules (those listed by httpd -l) do not need # to be loaded here. # 145 # Example: # LoadModule foo_module modules/mod_foo.so # LoadModule auth_basic_module modules/mod_auth_basic.so LoadModule auth_digest_module modules/mod_auth_digest.so 150 LoadModule authn_file_module modules/mod_authn_file.so LoadModule authn_alias_module modules/mod_authn_alias.so LoadModule authn_anon_module modules/mod_authn_anon.so ... LoadModule include_module modules/mod_include.so LoadModule log_config_module modules/mod_log_config.so 165 LoadModule logio_module modules/mod_logio.so LoadModule env_module modules/mod_env.so LoadModule ext_filter_module modules/mod_ext_filter.so LoadModule mime_magic_module modules/mod_mime_magic.so ... 206 # # Load config files from the config directory "/etc/httpd/conf.d". # Include conf.d/*.conf

rha230-5.0-1-en-2008-01-21T07:12:18-0500

29

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration


210

The various modules tend to introduce new conguration directives to modify their behavior. For example, the log_cong_module provides the LogFormat directive, which we will encounter later. In the conguration le, the module must be loaded (with LoadModule) before any directives it provides are encountered. In order to ease the distribution of modules using a package managed system (such as RPM), the
Include directive species external conguration les to include, either directly or by using

pathname expansion (le globbing).

The Main Section


The Main section of the conguration le includes conguration that effects the primary server, but directives in this section can be overridden by any virtual server. Figure 2-7. /etc/httpd/conf/httpd.conf
### Section 2: Main server configuration # 235 # The directives in this section set up the values used by the main # server, which responds to any requests that arent handled by a # <VirtualHost> definition. These values also provide defaults for # any <VirtualHost> containers you may define later in the file. # 240 # All of these directives may appear inside <VirtualHost> containers, # in which case these default settings will be overridden for the # virtual host being defined.

Server Identity: ServerName and ServerAdmin


The rst two directives in the main section help establish the identity of the server. Figure 2-8. /etc/httpd/conf/httpd.conf
245 # # ServerAdmin: Your address, where problems with the server should be # e-mailed. This address appears on some server-generated pages, such # as error documents. e.g. admin@your-domain.com # 250 ServerAdmin root@localhost # # ServerName gives the name and port that the server uses to identify itself. # This can often be determined automatically, but we recommend you specify 255 # it explicitly to prevent problems during startup. ... 264 #ServerName www.example.com:80

rha230-5.0-1-en-2008-01-21T07:12:18-0500

30

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration The ServerAdmin directive is mainly cosmetic. The email address is listed in the footer of the default error pages. For simple hosts, with a single external interface and therefore a clear concept of a hostname, the ServerName can be automatically determined. If in doubt, however, it should be specied manually. (For example, if the server is bound to multiple interfaces, the preferred name should be congured explicitly).

Server Content: the DocumentRoot


The DocumentRoot directive, one of the most fundamentally important, identies where in the lesystem the information to be be served is found. Recall that when the le portion of a URL is translated to a le in the lesystem, the document root provides the base of that translation. This directive is probably the most often overridden by a Virtual Host. The following default species the Red Hat Enterprise Linux document root as /var/www/html. Figure 2-9. /etc/httpd/conf/httpd.conf
# DocumentRoot: The directory out of which you will serve your # documents. By default, all requests are taken from this directory, but # symbolic links and aliases may be used to point to other locations. # 280 DocumentRoot "/var/www/html"

Specifying the Directory Index File: DirectoryIndex


In a previous lesson, we discussed the role of an index le, called index.html. We now see that the name of the le is congurable. Figure 2-10. /etc/httpd/conf/httpd.conf
# # DirectoryIndex: sets the file that Apache will serve if a directory # is requested. 385 # # The index.html.var file (a type-map) is used to deliver content# negotiated documents. The MultiViews Option can be used for the # same purpose, but it is much slower. # 390 DirectoryIndex index.html index.html.var

Notice that if multiple le names are specied, each will be searched for in sequence. Specifying too many alternatives, however, could lead to poor performance. For example, if migrating content from a Microsoft based server, setting DirectoryIndex to the following would be easier than renaming every le named index.htm to index.html.
DirectoryIndex index.html index.htm

rha230-5.0-1-en-2008-01-21T07:12:18-0500

31

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration


Tip: Index les can even be specied as an absolute reference. What do you think would be the effect of a conguration such as the following?
DirectoryIndex index.html /cgi-bin/index.cgi

Collecting Client Identities: HostnameLookups


Buried deep withing the conguration le is an important directive called HostnameLookups. Figure 2-11. /etc/httpd/conf/httpd.conf
435 # # HostnameLookups: Log the names of clients or just their IP addresses # e.g., www.apache.org (on) or 204.62.129.132 (off). # The default is off because itd be overall better for the net if people # had to knowingly turn this feature on, since enabling it means that 440 # each client request will result in AT LEAST one lookup request to the # nameserver. # HostnameLookups Off

The web server can easily determine the IP address of any client which is making a web request: its part of the requests IP protocol header. In order to determine the hostname of the client, however, the web server must work harder: it must perform a reverse DNS lookup on the clients IP address. This reverse lookup increases both time and network trafc on the part of the server, so by default, its disabled. As a result, all logging and access control list are implemented by IP address, not by hostname. If you desire logs and access control lists to use client hostnames instead of IP addresses, and are willing to pay the price in performance, HostnameLookup can be set to on.

Logging: ErrorLog, LogLevel, LogFormat, and CustomLog


The apache web server maintains two types of logs: transaction logs, and error logs. Transaction logging occurs with every web request ("hit"), and is highly congurable, potentially logging to multiple les. In contrast, there is only one error log, and only two questions associated with it: where, and how much. We start with the simpler of the two. Error Logging: ErrorLog and LogLevel Figure 2-12. /etc/httpd/conf/httpd.conf
# 465 # ErrorLog: The location of the error log file. # If you do not specify an ErrorLog directive within a <VirtualHost> # container, error messages relating to that virtual host will be # logged here. If you *do* define an error logfile for a <VirtualHost> # container, that hosts errors will be logged there and not here.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

32

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration


470 # ErrorLog logs/error_log # # LogLevel: Control the number of messages logged to the error_log. 475 # Possible values include: debug, info, notice, warn, error, crit, # alert, emerg. # LogLevel warn

By default, the web server logs to the le /var/log/httpd/error_log (recall the role of the ServerRoot directive, and the /etc/httpd/logs symlink). For the main server, its hard to think of a reason to ever change it, though virtual hosts often override it. More interesting is the LogLevel, which determines how much information is logged. The vocabulary draws directly from the syslog service. When troubleshooting, an administrator often ratchets up the logging by setting the LogLevel to debug, for example. Of course, more copious logging slows down overall performance, so once a problem has been resolved, logging is returned to a more suitable default.

Transaction Logging: LogFormat and CustomLog For every web request, there is a large amount of information that an administrator can choose to log (or not). Such transaction logs are often referred to as "access logs". The LogFormat directive allows administrators to assign names to collections of information, so that they are easy to refer to later. This is all LogFormat does, however. In order to use one of the formats, they must be associated with a CustomLog. Figure 2-13. /etc/httpd/conf/httpd.conf
480 # # The following directives define some format nicknames for use with # a CustomLog directive (see below). # LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined 485 LogFormat "%h %l %u %t \"%r\" %>s %b" common LogFormat "%{Referer}i -> %U" referer LogFormat "%{User-agent}i" agent

# "combinedio" includes actual counts of actual bytes received (%I) and sent (%O); this 490 # requires the mod_logio module to be loaded. #LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedi

The following table illustrates some of the parameters most commonly used in access logs. Table 2-1. Apache Log Parameters Parameter References %h %u Remote host (IP or hostname) Remote user (for HTTP authentication) Example 127.0.0.1 elvis

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

33

Chapter 2. Apache Conguration Parameter References %t %r %s %b Timestamp Request line (from HTTP protocol) HTTP response status code Response size (in bytes) Example [15/Jul/2005:06:55:44 -0400] GET /icons/compressed.gif HTTP/1.1 200 1079 (depends on name)

%{name}i HTTP header name

Many more exist as well. As usual, with all of this exibility comes the need for convention. Two commonly used conventions are the common format and the combined format, which are the rst two formats dened above. The common format records IP address, username (if any), timestamp, request line, response status, and number of bytes transferred. 1 The combined format adds the identity of the client application, and the referring page (if any). While the combined format is used by default in Red Hat Enterprise Linux, administrators could well choose to drop back to the common format to save space and improve performance. Many external log analysis utilities (such as webalizer) rely on logs being in a standard format, so an administrator should consider the consequences before changing the log format arbitrarily. Finally, once a format has been decided, it can be associated with a log le using the CustomLog directive. Figure 2-14. /etc/httpd/conf/httpd.conf
# The location and format of the access logfile (Common Logfile Format). 495 # If you do not define any access logfiles within a <VirtualHost> # container, they will be logged here. Contrariwise, if you *do* # define per-<VirtualHost> access logfiles, transactions will be # logged therein and *not* in this file. # 500 #CustomLog logs/access_log common # # If you would like to have separate agent and referer logfiles, uncomment # the following directives. 505 # #CustomLog logs/referer_log referer #CustomLog logs/agent_log agent # 510 # For a single logfile with access, agent, and referer information # (Combined Logfile Format), use the following directive: # CustomLog logs/access_log combined

As the above conguration suggests, multiple log les, each containing different information, could be updated with each hit, though of course performance is a consideration. By default, Red Hat Enterprise Linux only updates the single le /var/log/httpd/access_log, using the combined format.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

34

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration

Remapping the URL Namespace: Alias


Up until now, we have had a very clean concept of the URL namespace: the le portion of a URL maps directly to a le which exists underneath the document root directory. The Alias directive allows administrators to make arbitrary mappings from a portion of the URL namespace to any directory in the lesystem. Figure 2-15. /etc/httpd/conf/httpd.conf
# Aliases: Add here as many aliases as you need (with no limit). The format is # Alias fakename realname # # Note that if you include a trailing / on fakename then the server will 530 # require it to be present in the URL. So "/icons" isnt aliased in this # example, only "/icons/". If the fakename is slash-terminated, then the # realname must also be slash terminated, and if the fakename omits the # trailing slash, the realname must also omit it. # 535 # We include the /icons/ alias for FancyIndexed directory listings. If you # do not use FancyIndexing, you may comment this out. # Alias /icons/ "/var/www/icons/"

As an example, the default Red Hat Enterprise Linux conguration aliases http://localhost/icons/ to the directory /var/www/icons/, which is not underneath the document root, but a sibling of it. The remapping should be easy enough to conrm by following the above link, and taking a ls of the icons directory. For better or for worse, we now have a way to expose portions of our lesystem which are not under the document root. Another option is the use of symbolic links, which will be discussed in more detail shortly. Also, notice the comments about trailing slashes, which have often been a source of confusion. The Apache webserver automatically redirects clients which refer to directories without the trailing slash to an equivalent URL which does (watch closely as you access http://localhost/example, and note that the browser ends up showing the omitted trailing slash). This causes some directory related conguration which doesnt specify the omitted slash to be interpreted twice, which can cause confusion.

The Answer Book: http://localhost/manual


By now you could well be bewildered by the many different conguration directives, and in many ways weve just touched the tip of the iceberg. This seems a good time to introduce the manual, which in Red Hat Enterprise Linux ships as the separate http-manual package. Once installed, the manual can be accessed at http://localhost/manual.
[root@station ~]# yum install httpd-manual

... ============================================================================= Package Arch Version Repository Size =============================================================================

rha230-5.0-1-en-2008-01-21T07:12:18-0500

35

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Conguration


Installing: httpd-manual i386 2.2.3-6.el5 ... Installed: httpd-manual.i386 0:2.2.3-6.el5 Complete!
[root@station ~]# service httpd restart

rha-rhel

831 k

Stopping httpd: Starting httpd:

[ [

OK OK

] ]

The manual provides comprehensive documentation, organized by directive name, module name, or by topic (such as "Log Files" or "Virtual Hosts"). Anyone wishing to quickly refresh memories, or learn more about Apache conguration, should denitely load the manual as well.

Exercises
Lab Exercise
Objective: Congure the Apache web server. Estimated Time: 45 mins.

Specication
You will probably want to make a backup of the main Apache conguration le (/etc/httpd/conf/httpd.conf) before starting this exercise, so that you can later restore the default conguration. If you have not already downloaded http://rha-server/pub/rha/rha230/readings.tgz and extracted its contents into the /var/www/html directory (as specied in the previous exercise), do so now. Edit your Apache conguration so that the server meets the following specications. The suggested technique is to duplicate the relevant lines of your conguration le, comment out the original conguration, and edit the new line to make your changes. You will probably want to make incremental changes, checking your conguration as you go. 1. Congure the Apache webserver so that it accepts HTTP/1.1 KeepAlive requests, but will only wait 3 seconds for a followup request before closing the connection. Hint: you can conrm this conguration by capturing a transaction between the Firefox browser and your webserver with ethereal, and examining the HTTP headers of both the request and response. 2. Manage the bounds of the server pool, such that there are always between 2 and 4 (inclusive) child daemons present. 3. The Apache server should be bound to port 8888 (of at least the loopback address), in addition to port 80 (on all interfaces). (Note: you will need to drop SELinux into permissive mode in order to allow Apache to bind to a port other than 80 and 443). 4. Congure the web server such that index.htm is recognized as an index le, as well as index.html. Conrm your conguration by removing the le

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

36

Chapter 2. Apache Conguration


/var/www/html/readings/relat10h/index.html that you created in the previous exercise, if it exists, leaving the original /var/www/html/readings/relat10h/index.htm, and

referencing http://localhost/readings/relat10h/. 5. Congure the server so that clients are logged by hostname (when available) as opposed to IP address. (Hint: You are not expected to need to edit any LogFormat directives). 6. Set the log level for the error log to debug. 7. In addition to the default logging, have every web request logged to the le /var/log/httpd/common_log, using what is commonly referred to as the common format. 8. In the separate conguration le /etc/httpd/conf.d/rha.conf, establish an Alias, so that the URL http://localhost/images/ refers to the directory /var/www/html/readings/relat10h/pics. (If the relevant directory is still named picts, rename it or symlink it to pics).

Deliverables

1. A running Apache webserver, that accepts Keep-Alive requests, but will close connections after 3 seconds of inactivity. 2. The server should maintain a server pool of between 2 and 4 pre-forked child daemons. 3. The server should be bound to the loopback addresss port 8888, in addition to the normal port 80. 4. The server should treat les named index.htm as index les, in addition to the standard index.html. 5. Transaction logging should log clients by hostname, if available. 6. The error log should log all messages with debug and higher priority. 7. In addition to the standard access_log, a transaction log named /var/log/httpd/common_log should be kept, logging in the common format. 8. The URL http://localhost/images/ should resolve to /var/www/html/readings/relat10h/pics, due to an alias established in the /etc/httpd/conf.d/rha.conf conguration le.

Questions
For all of the following questions, assume the default Red Hat Enterprise Linux conguration of the Apache webserver, unless the question states otherwise. 1. Which directory serves as the ServerRoot directory (i.e., the directory used as the base for all relative le references in the conguration le) ? ( ) a. /var/www/html

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

37

Chapter 2. Apache Conguration ( ) b. /var/log/httpd ( ) c. /etc/httpd ( ) d. /etc ( ) e. None of the above

2. Which le(s) is(are) used to congure the Apache web server upon startup? ( ) a. /etc/httpd/conf/httpd.conf ( ) b. /etc/apache.conf ( ) c. /etc/httpd/conf.d/*.conf ( ) d. /etc/sysconfig/apache ( ) e. Both A and C

3. Which of the following directives could be used to improve the performance of a heavily loaded web server? ( ) a. KeepAlive ( ) b. MaxClients ( ) c. MaxSpareServers ( ) d. Timeout ( ) e. All of the above

4. Which of the following directives can be used to defend against memory leaks and other instabilities in poorly written libraries and CGI scripts? ( ) a. MaxClients ( ) b. MaxRequestsPerChild ( ) c. ServerLimit ( ) d. KeepAlive ( ) e. Listen

5. Which of the following best describes the default Apache server model? ( ) a. The server uses a traditional Unix forking model, where a new daemon is forked to handle connections for a particular client. ( ) b. The server uses a pre-forking model, whereby clients are distributed amongst a dynamic pool of pre-existing daemons. ( ) c. The server uses a multi-threaded model, whereby a single process clones multiple threads, each handling a distinct client.

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

38

Chapter 2. Apache Conguration ( ) d. The server uses a single process polling model, whereby the single process polls a collection of active connections for activity.

6. Which of the following lines would cause the web server to bind to port 8080 on the loopback address? ( ) a. Bind 127.0.0.1:8080 ( ) b. Bind 127.0.0.1 8080 ( ) c. Listen 127.0.0.1:8080 ( ) d. Listen 127.0.0.1 8080 ( ) e. None of the above

7. The apache manual states that %h is used to log the remote hostname or IP address. Yet, even using this parameter, and administrator nds a log le logs using IP addresses instead. Which of the following congurations would allow client hostnames to be logged? ( ) a. DNS /etc/resolv.conf ( ) b. HostnameLookups On ( ) c. LogNames On ( ) d. LogLevel info ( ) e. None of the above

8. Which of the following directives would have the same end effect as cd /var/www/html/data; ln -s ../images images ? ( ) a. Alias /data/images/ /var/www/html/images/ ( ) b. Symlink /images/ /data/images/ ( ) c. Alias /images/ /data/images/ ( ) d. View /var/www/html/images/ /data/images/ ( ) e. None of the above

9. Assuming the httpd-manual package is installed, where can Apache documentation be found? ( ) a. http://localhost/help ( ) b. http://localhost/guide ( ) c. http://localhost/apache ( ) d. http://localhost/man ( ) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

39

Chapter 2. Apache Conguration 10. After editing an Apache conguration le, what should be done for changes to take effect? ( ) a. chkconfig httpd on ( ) b. service httpd restart ( ) c. chkconfig httpd reload ( ) d. service httpd status ( ) e. No action is required, because the apache daemon actively monitors its conguration le.

Notes
1. The observant might notice the omission of the second eld, inevitably a hyphen ("-"). This eld used to refer to the username as returned by the legacy identd service, which is seldom implemented today.

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

40

Chapter 3. Apache Conguration: Containers


Key Concepts

The Apache web server allows context dependent conguration through the use of Directory,
Location, Files, and VirtualHost containers.

Often, the Options directive is used within containers to allow or disallow symbolic link resolution (with FollowSymLinks) and dynamic directory generation (with Indexes), among other parameters. Often, the Order, allow from, and deny from directives are used within containers to implement access control based on the clients IP address or hostname. The default Red Hat Enterprise Linux conguration allows the resolution of symbolic links almost everywhere, but limits the generation of dynamic indexes to the intended document root directory. Dynamic information about the Apache webserver can be obtained using custom handlers which are conventionally associated with the /server-status and /server-info locations.

Discussion
Tailoring Customization to Particular Content: Containers
The Apache webserver allows conguration to be customized to particular les or directories using containers. Containers start with an XMLish opening tag, such as <Directory ...>, and end with an XMLish closing tag, such as </Directory>. Directives found within the container only affect les which fall under the containers scope. There are essentially four types of scoping containers, which are exemplied below and itemized in the following table. Figure 3-1. Sample Apache Containers
<Directory "/var/www/icons"> Options Indexes MultiViews AllowOverride None Order allow,deny Allow from all </Directory> <Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from .example.com </Location>

41

Chapter 3. Apache Conguration: Containers

<Files ~ "*.hide"> Order allow,deny Deny from all </Files> <VirtualHost *:80> ServerAdmin webmaster@dummy-host.example.com DocumentRoot /www/docs/dummy-host.example.com ServerName dummy-host.example.com ErrorLog logs/dummy-host.example.com-error_log CustomLog logs/dummy-host.example.com-access_log common </VirtualHost>

Table 3-1. Apache Scoping Containers Directive


Directory Location Files VirtualHost

Scope All les which exist in or underneath the specied directory in the lesystem, after URL to lename translation occurs. All les which exist in or underneath the specied location in the URL namespace, before URL to lename translation occurs. All les which match the specied pattern, no matter where they exist in the lesystem or URL namespace. All les served by a particular virtual server. Virtual hosts will be covered in detail in a later lesson.

The argument to the opening tag species the relevant le or directory (or, in the case of VirtualHost, IP address). The lename may either be explicit, or shell-like pathname expansion (le globbing) can be used.

Common Container Conguration


Skimming the containers exemplied above, one nds that container conguration often involves the following three concepts. 1. Options: Various capabilities of the web server are grouped under a general Options directive. 2. ACLs: The web server allows access control lists (or ACLs, informally pronounced "Ack-uls") to specify which clients are allowed to access information, using the Order, Allow, and Deny directives. (Access control can also be based on authenticated users, unfortunately a topic beyond the scope of the current course). 3. Overrides: If allowed with the AllowOverride directive, local conguration les intermixed with webserver content can dynamically override the startup conguration. We look at each of these syntaxes in turn.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

42

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers

General Options: Options


The Apache server supports the following options, which are specied as arguments to the Options directive, usually within a scoping container. Of these, the rst two are most commonly used. Table 3-2. Apache Options Option
Indexes

Effect When a URL references a directory (as opposed to a regular le), and no
index.html le is present (more on this in a bit), and this option is enabled, the web server will return an automatically generated directory listing. If Indexes is

disabled, a 403 error page will be returned to the client (Access Forbidden).
FollowSymLinks This option must be enabled in order for the webserver to resolve (follow) a

symbolic link. A qualication of the FollowSymLinks option, where the symlink will only be SymLinksIfOwnerMatch followed if the le owner of the resulting le is the same as the le owner of the link itself.
ExecCGI

Allow CGI executables to be executed from withing this scope. (More on these later).

Includes, Server side includes are allowed (or, in the latter case, mostly allowed) from IncludesNOEXEC within this scope. Server side includes are beyond the scope of this course. Multiviews

If enabled, content negotiation between the client and the server is supported. This allows a server to serve a document in the most appropriate of multiple languages, for example. Further discussion of Multiviews is beyond the scope of this course. This option refers to all of the previous options collectively, with the exception of
Multiviews. Unless otherwise specied, this is the default conguration. (Recall

All

that in Red Hat Enterprise Linux, however, a different policy applies to the root directory, effectively establishing a different default.) Why not Indexes? The decision to allow the web server to automatically generate indexes or not is really a matter of control. If indexes are automatically generated, then merely locating a le underneath the document root allows anyone to view it or copy it (often with automated command line clients such as wget), unless an index.html le is created to hide les within a particular directory. In contrast, if indexes are not allowed, les must be explicitly linked from other les (index.html or otherwise) to be easily discovered. Many low maintenance, public web sites leave indexes on (such as the ofcial Linux kernel repository (http://www.kernel.org/pub/linux)). Other web sites, hoping for a more professional look or more rened control of information, do not.

Why not resolve Symbolic Links? Again, the decision to allow symlink resolution is basically one of control. If symlinks are not allowed,

rha230-5.0-1-en-2008-01-21T07:12:18-0500

43

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers an administrator has a clear concept of what portions of the le system are exposed through the web server (only les underneath the document root). If symlinks are resolved, however, a symlink underneath the document root could expose any other part of the lesystem. More subtly, the decision to not resolve symlinks can degrade performance. When resolving a path to reference a le, the kernel automatically resolves symlinks. (If you were to cat the le /foo/biz/baz/buzz, you do not need to worry if the directory biz or baz is actually a symlink). If symlinks are disabled, however, the web server must make a system call on each of the nodes within a le path, asking "is it a symlink? is it a symlink? is it a symlink?" This degradation is one of the reasons why the default Red Hat Enterprise Linux conguration leaves FollowSymLinks enabled.

Options Syntax

The Options directive takes effect for the scope specied by its enclosing container. For example, the following container would enable indexes and symlink resolution for all les underneath the directory /var/www/html.
<Directory /var/www/html> Options FollowSymLinks Indexes </Directory>

The following container, however, would enable indexes and server side includes underneath /var/www/html/widgets.
<Directory /var/www/html/widgets> Options Indexes Includes </Directory>

The directory /var/www/html/widgets does not inherit its options from /var/www/html, but instead gets its conguration entirely from the new Options line. Because FollowSymLinks is not mentioned, symlinks underneath /var/www/html/widgets will not be resolved. In contrast, options can be preceded by a "+" or "-", implying that options should be inherited from the enclosing scope, with the simple addition or stripping of a particular option. Consider rewriting the above container as follows.
<Directory /var/www/html/widgets> Options +Includes </Directory>

In this case, the /var/www/html/widgets directory would have Includes, Indexes, and FollowSymLinks enabled (the latter two inherited from /var/www/html). Similarly, the following container would leave /var/www/html/widgets with only the
FollowSymLinks option enabled. <Directory /var/www/html/widgets> Options -Indexes </Directory>

rha230-5.0-1-en-2008-01-21T07:12:18-0500

44

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers

Client Access Control: Order, Allow, Deny


The Apache web server allows an administrator to impose access control restrictions on a directory by directory (or even le by le) basis using access control lists. These ACLs are composed of the following directives. The Allow Directive The Allow directive uses the following syntax to specify which clients are allowed to connect to a given resource.
Allow from client_specification

The client_specification is composed of a whitespace separated list of any of the following elements. Table 3-3. Apache ACL client specication Syntax
ALL

Example
ALL

Meaning All clients The specied client All clients whose IP address begins as specied

Full IP addresses Partial IP addresses Network/Netmask notation CIDR notation

192.168.0.3 172.63.

192.168.1.64/255.255.255.192 All clients who belong to the specied subnet 192.168.1.64/26 All clients who belong to the specied subnet (this example is completely equivalent to the preceding example). All clients whose reverse lookup domain name ends as specied (reverse lookups must be enabled with HostnameLookups)

A full or partial domain name

.example.com

The Deny Directive The Deny directive uses an identical syntax to specify which clients are not allowed to connect to a given resource.
Deny from client_specification

The client_specification is composed of the same elements as for the Allow directive.

The Order directive Heres where things get interesting. Whenever client ACLs are specied with the Allow and Deny directives, the order of precedence must be specied with the Order directive. The Order directive usually comes in one of two forms.
Order Allow,Deny

rha230-5.0-1-en-2008-01-21T07:12:18-0500

45

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers In this case, any clients which are unspecied (not matching any rule) or over specied (they match both an allow and deny rule) are denied.
Order Deny,Allow

In this case, any clients which are unspecied or over specied are allowed. Surprisingly, no spaces are allowed around the comma in either case. Some examples are in order. Example 1
<Directory /some/sensitive/content> Order Deny,Allow Deny from All Allow from 192.168.0. </Directory>

In this case, only clients from within the 192.168.0.0/255.255.255.0 subnet are allowed to access les underneath /some/sensitive/content. Example 2
<Directory /keep/them/out> Order Allow,Deny Allow from 192.168.0. Deny from 192.168.0.4 </Directory>

In this case, clients from within the 192.168.0.0/255.255.255.0 subnet are allowed to access les underneath /keep/them/out, except for client 192.168.0.4. All clients outside of the subnet are not allowed access. Example 3
<Directory /only/for/example> HostNameLookups on Order Allow,Deny Allow from .example.com </Directory>

In this case, clients from within the example.com domain allowed to access les underneath /only/for/example. If you are having trouble guring out how the term "order" applies to the effect of the Order directive, your author sympathizes. However, with a little experience, a certain sense of the syntax can be made. Until then, make sure that you conrm any ACLs by actually trying to access the material from the appropriate clients.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

46

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers

Red Hat Enterprise Linux Default Conguration


Now that we know a little about containers, were ready to examine some of the containers that come in the default Red Hat Enterprise Linux Apache conguration. The rst container encountered establishes a fairly paranoid default policy. Figure 3-2. /etc/httpd/conf/httpd.conf
# # Each directory to which Apache has access can be configured with respect # to which services and features are allowed and/or disabled in that 285 # directory (and its subdirectories). # # First, we configure the "default" to be a very restrictive set of # features. # 290 <Directory /> Options FollowSymLinks AllowOverride None </Directory>

In this case, the "/" in the opening tag is not syntax, but a reference to the root directory. So from the root directory on down (i.e., everywhere), the specied policies apply. Specically, the only allowed Option is FollowSymLinks, and no overrides are allowed. The next container loosens things up a bit for the directory /var/www/html. (Why was this directory picked for special attention?) Figure 3-3. /etc/httpd/conf/httpd.conf
290 <Directory "/var/www/html"> # # Possible values for the Options directive are "None", "All", # or any combination of: 310 # Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews # # Note that "MultiViews" must be named *explicitly* - - - "Options All" # doesnt give it to you. # 315 # The Options directive is both complicated and important. Please see # http://httpd.apache.org/docs-2.0/mod/core.html#options # for more information. # Options Indexes FollowSymLinks 320 # # AllowOverride controls what directives may be placed in .htaccess files. # It can be "All", "None", or any combination of the keywords: # Options FileInfo AuthConfig Limit 325 # AllowOverride None

rha230-5.0-1-en-2008-01-21T07:12:18-0500

47

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers


# # Controls who can get stuff from this server. 315 # Order allow,deny Allow from all </Directory>

In answer to the above question, access to content beneath /var/www/html is loosened a bit because that directory contains the expected content to be served from the webserver. The container also contains some client access control conguration, but only as an example, as the effect of the conguration is to allow everyone.

Location Containers: server-status and server-info


We nd the following two examples of Location containers within the default conguration le, both commented out. Figure 3-4. /etc/httpd/conf/httpd.conf
# # Allow server status reports generated by mod_status, # with the URL of http://servername/server-status 900 # Change the ".example.com" to match your domain to enable. # #<Location /server-status> # SetHandler server-status # Order deny,allow 905 # Deny from all # Allow from .example.com #</Location> # 910 # Allow remote server configuration reports, with the URL of # http://servername/server-info (requires that mod_info.c be loaded). # Change the ".example.com" to match your domain to enable. # #<Location /server-info> 915 # SetHandler server-info # Order deny,allow # Deny from all # Allow from .example.com #</Location>

Both of these provide examples of virtual locations, in that, if enabled (and customized a bit), the server would respond to requests for http://localhost/server-info and http://localhost/server-status. The URLs do not map to any particular directory on the lesystem, however, so a Directory container would have been inappropriate. Each of these containers implements a custom handler using the SetHandler directive. A thorough discussion of the concept of a handler is beyond the scope of the current class, but essentially a handler

rha230-5.0-1-en-2008-01-21T07:12:18-0500

48

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers determines how the server responds to a request. The default handler, which returns the contents of the referenced le to the client, is the only handler weve encountered so far. Other handlers allow the web server to respond differently to requests.

The server-status Handler


The server-status handler, when invoked, returns a page of status information (formatted as HTML) back to the client. The following conguration would attach this handler to the http://localhost/server-status url, but restrict access to 127.0.0.1.
<Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from 127.0.0.1 </Location>

The Apache web server responds to http://localhost/server-status with a page of status information similar to the following. Figure 3-5. Apache Web Server Status Page

rha230-5.0-1-en-2008-01-21T07:12:18-0500

49

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers

The server-info Handler


Similarly, the server-info handler returns a dynamically generated page which reports the web servers current conguration.
<Location /server-info> SetHandler server-info Order deny,allow Deny from all Allow from 127.0.0.1 </Location>

With this conguration active, the Apache web server responds to http://localhost/server-info with a page of conguration information similar to the following. Figure 3-6. Apache Web Server Status Page

Exercises
Lab Exercise
Objective: Congure the Apache web server using containers. Estimated Time: 45 mins.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

50

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers

Specication
If you have not already downloaded http://rha-server/pub/rha/rha230/readings.tgz and extracted its contents into the /var/www/html directory (as specied in the previous exercise), do so now. Also, apply the image directory x by renaming /var/www/html/readings/relat10h/picts to /var/www/html/readings/relat10h/pics if you have not already somehow resolved the problem. Edit your Apache conguration so that the server meets the following specications. Place all of your conguration in the le /etc/httpd/conf.d/rha.conf. You should be starting with directory structure similar to the following.
[root@station ~]# tree /var/www/html/readings/

/var/www/html/readings/ |-- relat10h | |-- ap01.htm | |-- ap02.htm | |-- ... | |-- index.htm | |-- index.html -> index.htm | |-- pics -> picts/ | |-- picts | | |-- arrow.gif | | |-- eq01.gif | | |-- ... | |-- preface.htm | -- works-blue.css |-- relativity -> relat10h/ |-- the_god_of_mars.html -- war_of_the_worlds.html

1. You decide that the use of symbolic links makes it too difcult to maintain control over a web site. Set options such that symbolic links are disabled everywhere underneath the /var/www/html/readings directory. (Notice that if you solved the image directory name problem with a symbolic link, you will need to now rename the directory instead). 2. You are willing to allow people to read Einsteins relativity starting from the table of contents, but do not want people browsing the directory structure directly. Disable directory indexes underneath the /var/www/html/readings/relat10h directory. 3. You decide that you would like to restrict access to the all of the readings only to local clients. Implement a policy whereby the contents underneath the /var/www/html/readings directory is only available to clients whose IP address starts 127.0. 4. However, you would like your graphics consultant to be able to review your images. For the directory /var/www/html/readings/relat10h/pics, allow access to all clients who start 127.0., and the special IP address 127.1.1.1. Also, enable directory indexes for this directory. 5. Because symbolic links are now disabled, you will no longer be able to make use of the relativity symbolic link to access the relat10h directory. Instead, establish an alias such that http://localhost/readings/relativity references the /var/www/html/readings/relat10h directory.

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

51

Chapter 3. Apache Conguration: Containers 6. Again, because symbolic links are now disabled, you will no longer be able to solve the index.htm problem with a symbolic link. Make sure that index.htm is considered a directory index as well. (Note, you might just need to make sure that your implementation from the previous lessons exercise is still in place.) 7. You would like to monitor the performance of your web server. In the main /etc/httpd/conf/httpd.conf conguration le, enable the http://localhost/server-status and http://localhost/server-info location containers, so that you may view dynamically generated performance and conguration information. 8. You would like your graphics consultant to be able to monitor the performance as well, so allow both 127.0.0.1 and 127.1.1.1 to access to these locations, but only these IP addresses.

Deliverables

1. The web server will not resolve symbolic links underneath the /var/www/html/readings directory. 2. The web server will not generate directory indexes underneath the /var/www/html/readings/relat10h directory. 3. Only clients whose IP address begins 127.0 may access content under the /var/www/html/readings. 4. However, the /var/www/html/readings/relat10h/pics directory allows access to 127.1.1.1 in addition to the 127.0 clients. Dynamically generated indexes are also allowed for this directory. 5. The URL http://localhost/readings/relativity resolves to /var/www/html/readings/relat10h. 6. The URL http://localhost/server-status presents dynamically generated status information, but is only available to 127.0.0.1 and 127.1.1.1. 7. The URL http://localhost/server-info presents dynamically generated status information, but is only available to 127.0.0.1 and 127.1.1.1.

Questions
1. Which of the following is not a legitimate keyword for opening an Apache scoping container? ( ) a. Files ( ) b. Directory ( ) c. Location ( ) d. Virtual Host ( ) e. All of these keywords are legitimate.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

52

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers Use the following excerpt from an Apache conguration le, and the following directory structure, to answer the next 7 questions. You may assume there are no relevant URL aliases, and that all ownerships, permissions, and SELinux contexts are correct.
<Directory /var/www/html/pics> Options -Indexes -FollowSymLinks Order deny,allow deny from 192.168.1. </Directory> <Location /ogg> Options +Indexes Order allow,deny allow from 192.168.0. </Location>
[root@station ~]# tree /var/www/html

/var/www/html/ |-- ogg/ | |-- 01_track_1.ogg | |-- 02_track_2.ogg | |-- 03_track_3.ogg | -- _hidden/ | |-- 04_track_4.ogg | -- 05_track_5.ogg -- pics/ |-- demo/ | |-- 00001.jpg | |-- 00004.jpg | |-- 00010.vga.jpg | -- index.html |-- feb/ | |-- 15479.vga.jpg | -- 15491.jpg |-- index.html |-- mar/ | |-- 15651.jpg | -- 15659.vga.jpg -- spring -> mar

(Note that /var/www/html/pics/spring is a symbolic link to mar). 2. What would be the result of the client 192.168.0.4 trying to access the URL http://server.example.com/pics/mar/? ( ) a. A dynamically generated index. ( ) b. A 403 "Forbidden" error. ( ) c. A 404 "File Not Found" error. ( ) d. The contents of the le /var/www/html/pics/demo/index.html ( ) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500

53

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Conguration: Containers

3. What would be the result of the client 192.168.0.4 trying to access the URL http://server.example.com/pics/demo/? ( ) a. A 404 "File Not Found" error. ( ) b. The contents of the le /var/www/html/pics/demo/index.html ( ) c. A 403 "Forbidden" error. ( ) d. A dynamically generated index. ( ) e. None of the above

4. What would be the result of the client 192.168.0.4 trying to access the URL http://server.example.com/pics/spring/? ( ) a. The contents of the le /var/www/html/pics/index.html ( ) b. A dynamically generated index. ( ) c. A 404 "File Not Found" error. ( ) d. A 403 "Forbidden" error.

5. What would be the result of the client 192.168.1.1 trying to access the URL http://server.example.com/ogg/02_track_2.ogg? ( ) a. A 403 "Forbidden" error. ( ) b. The contents of the le /var/www/html/ogg/02_track_2.ogg ( ) c. A dynamically generated index. ( ) d. A 404 "File Not Found" error. ( ) e. None of the above

6. What would be the result of the client 192.168.0.4 trying to access the URL http://server.example.com/ogg/_hidden/05_track_5.ogg? ( ) a. A 404 "File Not Found" error. ( ) b. A 403 "Forbidden" error. ( ) c. The contents of the le /var/www/html/ogg/_hidden/05_track_5.ogg ( ) d. A dynamically generated index. ( ) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

54

Chapter 3. Apache Conguration: Containers 7. What would be the result of the client 192.168.0.4 trying to access the URL http://server.example.com/ogg/*.ogg? ( ) a. The contents of all les matched by the glob. ( ) b. A 403 "Forbidden" error. ( ) c. A 404 "File Not Found" error. ( ) d. A dynamically generated index including all les matched by the glob. ( ) e. None of the above

8. What would be the result of the client 192.168.1.1 trying to access the URL http://server.example.com/ogg/i_dont_exist.ogg? ( ) a. A 403 "Forbidden" error. ( ) b. A 404 "File Not Found" error. ( ) c. The contents of the le /var/www/html/ogg/i_dont_exist.ogg. ( ) d. A dynamically generated index. ( ) e. None of the above Use the following excerpt from an Apache conguration le to answer the next 2 questions. You may assume there are no other relevant URL aliases, and that all ownerships, permissions, and SELinux contexts are correct.
<Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from 127.0.0.1 </Location>

9. What would be the result of the client 192.168.0.4 trying to access the URL http://server.example.com/server-status? ( ) a. A dynamically generated summary of the state of the each process in the Web Server Pool. ( ) b. A 404 "File Not Found" error. ( ) c. A 403 "Forbidden" error. ( ) d. The contents of the le /var/www/html/server-status. ( ) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

55

Chapter 3. Apache Conguration: Containers 10. What would be the result of the client 127.0.0.1 trying to access the URL http://localhost/server-status? ( ) a. A dynamically generated summary of the contents of the /var/www/html/server-status/ directory. ( ) b. A 404 "File Not Found" error. ( ) c. A 403 "Forbidden" error. ( ) d. The contents of the le /var/www/html/server-status. ( ) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500

56

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 4. Virtual Hosts


Discussion
Virtual Hosts
One of the reasons for the popularity of the Apache web server is that it can easily take on the personalty of any of multiple web servers, each of which is referred to as a virtual host. As a pre-requisite to virtual hosting, DNS (domain name service) must resolve multiple hostnames to the single machine which is running the Apache web server. You will discover in the workbook on DNS, this is not difcult to arrange. In our current discussions, however, we will assume that DNS is appropriately congured. There are two approaches to virtual hosting supported by the Apache web server: IP based virtual hosting, and name based virtual hosting. We look at each of these in turn.

IP Based Virtual Hosting


For IP based virtual hosting, the machine running the Apache server must be assigned multiple IP addresses. These addresses could either be a result of multiple Ethernet cards (and thus multiple distinct network interfaces), or the result of a Linux trick called IP aliasing, which assigns multiple IP addresses to a single Ethernet card. For IP based virtual hosting, distinguishing the virtual hosts of the web server is trivial. The web server merely needs to examine the server IP address which is part of the incoming client request TCP/IP packet. Consider the machine which answers to the hostname www.republican.pol, with an IP address of 192.168.0.1, and the hostname www.democrat.pol, with an IP address of 192.168.0.2. (No, there is no top level domain .pol - this is just an example).
<VirtualHost 192.168.0.1> ServerAdmin webmaster@republican.pol ServerName www.republican.pol DocumentRoot /var/www/republican.pol ErrorLog logs/republican.pol-error_log CustomLog logs/republican.pol-access_log common </VirtualHost> <VirtualHost 192.168.0.2> ServerAdmin webmaster@democrat.pol ServerName www.democrat.pol DocumentRoot /var/www/democrat.pol ErrorLog logs/democrat.pol-error_log CustomLog logs/democrat.pol-access_log common </VirtualHost>

Now, requests for http://www.republican.pol/propaganda.html would be mapped to the le /var/www/republican.pol/propaganda.html, and similarly, requests for

57

Chapter 4. Virtual Hosts


http://www.democrat.pol/propaganda.html would be mapped to the le /var/www/democrat.pol/propaganda.html. The same web server would be serving both web

sites, but the client has no way of knowing. To the client, they seem to be completely independent sites. What conguration can be found within a VirtualHost container? Anything found within the Main section of the conguration le. The example above has the two hosts using distinct document roots and logs. Just as easily, they could add distinct Aliases, Options, and ACLs, and a host of other conguration.

Name Based Virtual Hosts


While IP based virtual hosting is simple, it suffers from the fact that each distinct virtual host must be assigned a distinct IP address, while publicly routable IP addresses are often a precious resource. For this reason, name based virtual hosting was developed. With name based virtual hosting, multiple hostnames resolve to the same IP address. For example, the hostnames www.democrat.pol, www.libertarian.pol, and www.green.pol could all resolve to the IP address 192.168.0.2. In this case, however, the web server has a harder time distinguishing the various hosts, because the IP address of the server in the TCP/IP request packet for each is the same. The solution is that the web server needs to "dig deeper" into the request HTTP protocol. Starting with HTTP/1.1, clients are required to supply a host HTTP header with every web request, which identies the hostname of the requested site. The server can then attempt to match the supplied hostname with the ServerName of the requested site. In order to congure the Apache web server to "dig deeper" into the HTTP protocol in this manner, the NameVirtualHost directive must be used to identify a server IP address as one which is being used for name based virtual hosting. Consider the following extension to the example above.
<VirtualHost 192.168.0.1> ServerAdmin webmaster@republican.pol ServerName www.republican.pol DocumentRoot /var/www/republican.pol ErrorLog logs/republican.pol-error_log CustomLog logs/republican.pol-access_log common </VirtualHost> NameVirtualHost 192.168.0.2 <VirtualHost 192.168.0.2> ServerAdmin webmaster@democrat.pol ServerName www.democrat.pol DocumentRoot /var/www/democrat.pol ErrorLog logs/democrat.pol-error_log CustomLog logs/democrat.pol-access_log common </VirtualHost> <VirtualHost 192.168.0.2> ServerAdmin webmaster@libertarian.pol ServerName www.libertarian.pol DocumentRoot /var/www/libertarian.pol ErrorLog logs/libertarian.pol-error_log

rha230-5.0-1-en-2008-01-21T07:12:18-0500

58

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 4. Virtual Hosts


CustomLog logs/libertarian.pol-access_log common </VirtualHost> <VirtualHost 192.168.0.2> ServerAdmin webmaster@green.pol ServerName www.green.pol DocumentRoot /var/www/green.pol ErrorLog logs/green.pol-error_log CustomLog logs/green.pol-access_log common </VirtualHost>

NameVirtualHost: The IP address 192.168.0.2 has now been identied as an address for which the server is implementing name based virtual hosting. Any request received over this IP address will now have its HTTP headers examined for the name of the server.

ServerName: The hostname supplied by the HTTP headers will be matched against the ServerName directive of all virtual hosts which share the relevant IP address. The ServerName directive now takes on new importance. What if the same virtual host should answer to more than one hostname (such as www.democrat.pol and just democrat.pol)? The ServerAlias directive can be used to add multiple names to consider when attempting to nd a matching virtual host, as in the following example, where the relevant line has been highlighted.
<VirtualHost 192.168.0.2> ServerAdmin webmaster@democrat.pol ServerName www.democrat.pol
ServerAlias democrat.pol democrat www.donkey.pol donkey.pol donkey

DocumentRoot /var/www/democrat.pol ErrorLog logs/democrat.pol-error_log CustomLog logs/democrat.pol-access_log common </VirtualHost>

What if, probably due to a misconguration, a match in not found amongst the various 192.168.0.2 virtual hosts? The answer is that Apache defaults to the rst dened server on that IP address, in this case, www.democrat.pol. Once a virtual host has been dened for a NameVirtualHost IP address, requests over that IP address will never fall through to the main server. Notice that, in the example above, the server is really simultaneously implementing IP based virtual hosting (over IP address 192.168.0.1) and name based virtual hosting (over IP address 192.168.0.2).

Exercises
Lab Exercise
Objective: Congure Apache virtual hosts Estimated Time: 45 mins.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

59

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 4. Virtual Hosts

Specication
This lab will consist of setting up virtual hosts for four distinct trade organizations which are all sharing a common web server. The various virtual hosts will be bound to variants of the loopback address, so all conguration will be local to you machine. The skills required to congure a "real world" external web server would be nearly identical, however, only the IP addresses would need to change. 1. Create appropriate DNS entries. As a prerequisite, DNS should be congured to resolve all relevant hostnames appropriately. For our purposes, simply adding the following entries to your local /etc/hosts le will sufce.
127.1.1.1 127.1.1.2 127.1.1.2 127.1.1.2 www.peanutbutterisgood.rha www.jellyisgood.rha www.jamisgood.rha www.marmaladeisgood.rha

If you have congured the le correctly, you should be able to individually ping each of the hostnames, and conrm that they resolve correctly. (Dont be concerned that theres not really a top level domain called rha. Well x that in an upcoming workbook.) 2. Four advocacy organizations, one each promoting peanut butter, jelly, jam, and marmalade, want to use common infrastructure to support what looks like four independent sites. You are to congure your web server so that it serves four virtual hosts, with the following parameters. In the following table, all document roots are relative to the directory /var/www/vhostlab, represented by .... You will probably have to create this directory. Hostname IP Address Type Document Root

www.peanutbutterisgood.rha

IP 127.1.1.1 based Name 127.1.1.2 based Name 127.1.1.2 based Name 127.1.1.2 based

.../pb_root

www.jellyisgood.rha

.../namevhost/jelly_root

www.jamisgood.rha

.../namevhost/jam_root

www.marmaladeisgood.rha

.../namevhost/marmalade_root

The content for the various websites can be found at http://rha-server/pub/rha/rha230/pbandj_website.tgz. Each site consists of a single index.html le found in the relevantly named directory. Each index.html le also references a background image referenced as /images/some_name.jpg. a. Extract the tar archive, and position the index.html les so that they are located within the appropriate document roots. b. Within the tar archive, all four images are found in a single images directory. Install this

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

60

Chapter 4. Virtual Hosts directory on your web server as the directory /var/www/vhostlab/images. Congure your web server so each virtual host can reference images is this directory using a URL of the form http://vhostname/images/some_name.jpg. You may use whatever method you like, as long as the images are not moved (or copied) from the images directory, and you do not modify the index.html les. If installed correctly, your site should have the following minimum structure. (You may have added some additional links or whatnot to solve the image directory problem).
/var/www/vhostlab/ |-- images | |-- jam.jpg | |-- jelly.jpg | |-- marmalade.jpg | -- peanutbutter.jpg |-- namevhost | |-- jam_root | | -- index.html | |-- jelly_root | | -- index.html | -- marmalade_root | -- index.html -- pb_root -- index.html

3. Set options such that clients accessing http://www.peanutbutterisgood.rha/images receive a dynamically generated index, but dynamically generated indexes for http://www.jamisgood.rha/images, http://www.jellyisgood.rha/images, and http://www.marmeladeisgood.rha/images are prohibited. 4. The site http://www.peanutbutterisgood.rha should log hits (client access) to the le /var/log/httpd/pb_access_log, using the common format. The three named based virtual hosts should all log hits to the le /var/log/httpd/fruity_access_log, again using the common format. 5. Older web clients use the HTTP/1.0 protocol, instead of the HTTP/1.1 protocol, and do not always provide the HTTP host: header required to resolve name based virtual hosts. As a result, when accessing a site which uses named based virtual hosting, they are always bound to the default (rst dened) virtual host. In order to accommodate these older clients, create a new name based virtual host, with a ServerName of DummyPlaceholder, and assign it a document root of /var/www/vhostlab/namevhost. Make sure that its denition occurs before any other virtual host denitions for IP address 127.1.1.2. Create the le /var/www/namedlab/namevhost/index.html, with the following content.
<p>Which of the following high quality sites are you trying to access?</p> <ul> <li><a href="/jelly_root">www.jellyisgood.rha</a></li> <li><a href="/jam_root">www.jamisgood.rha</a></li> <li><a href="/marmalade_root">www.marmaladeisgood.rha</a></li> </ul>

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

61

Chapter 4. Virtual Hosts You may conrm your conguration by accessing the web server by IP address, instead of hostname: http://127.1.1.2. Make sure that pages accessed through this new (unnamed) virtual host resolve images correctly.

Deliverables

1. A local DNS conguration which resolves www.peanutbutterisgood.rha to 127.1.1.1, and each of www.jellyisgood.rha, www.jamisgood.rha, and www.marmaladeisgood.rha to 127.1.1.2. 2. An IP based virtual host on 127.1.1.1, with a document root of /var/www/vhostlab/pb_root, with the specied content, which logs hits to /var/log/httpd/pb_access_log using the common format. 3. Three name based virtual hosts (www.jellyisgood.rha, www.jamgood.rha, and www.marmaladeisgood.rha) which all share the IP address 127.1.1.2, mapped to the document roots /var/www/vhostlab/namevhost/jelly_root, /var/www/vhostlab/namevhost/jam_root, and /var/www/vhostlab/namevhost/marmalade_root, respectively, with the specied content. 4. Each name based host logs hits to the shared log le /var/log/httpd/fruity_access_log using the common format. 5. Requests for all four virtual hosts should resolve the URL /images to the directory /var/www/namevhost/images. 6. For the IP based virtual hosts 127.1.1.1, requests to the URL /images should result in a dynamically generated index. For all named based virtual hosts, dynamic index generation of /images should be disabled. 7. In order to support legacy clients, all requests which resolve to the host 127.1.1.2 which do not directly reference one of the specied name virtual hosts by name should resolve to the document root /var/www/namedlab, which contains the le index.html with the specied content.

Questions
1. Which of the following protocols does the Apache webserver use to associate an IP-based virtual host with a client request? ( ) a. TCP/IP ( ) b. DNS ( ) c. ARP ( ) d. HTTP ( ) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500

62

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 4. Virtual Hosts 2. Which of the following protocols does the Apache webserver use to associate a Name-based virtual host with a client request? ( ) a. TCP/IP ( ) b. DNS ( ) c. ARP ( ) d. HTTP ( ) e. None of the above

3. Which of the following directives would you not be able to override using an Apache virtual host? ( ) a. DocumentRoot ( ) b. ServerName ( ) c. KeepAliveTimeout ( ) d. ErrorLog ( ) e. DirectoryIndex Use the following excerpt from an Apache web servers main conguration le to answer the following 7 questions.
... DocumentRoot ... ErrorLog CustomLog DirectoryIndex ...

/var/www/html logs/error_log logs/access_log index.html

combined

<VirtualHost 192.168.24.32> DocumentRoot ServerName ErrorLog CustomLog </VirtualHost> NameVirtualHost 192.168.24.33 <VirtualHost 192.168.24.33> DocumentRoot ServerName ErrorLog CustomLog DirectoryIndex Alias /seeds/ /var/www/virtual/hamster.edu www.hamster.edu logs/hamster-error-log logs/hamster-access-log custom nuts.html /usr/share/seeds/ /var/www/virtual/chipmunk.edu www.chipmunk.edu logs/chipmunk-error-log logs/chipmunk-access-log combined

rha230-5.0-1-en-2008-01-21T07:12:18-0500

63

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone tollfree (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 4. Virtual Hosts

</VirtualHost> <VirtualHost 192.168.24.33> DocumentRoot ServerName ErrorLog CustomLog /var/www/virtual/gerbil.edu www.gerbil.edu /var/www/virtual/gerbil.edu/.hterrors logs/gerbil-access-log combined

<Location /seeds> Options -Indexes </Location> </VirtualHost>

You may assume that no omitted conguration affects URL to lename translation, and that an external DNS server appropriately maps the following hostnames. Hostname www.chipmunk.edu www.rat.edu www.hamster.edu www.gerbil.edu www.lemming.edu IP Address 192.168.24.32 192.168.24.32 192.168.24.33 192.168.24.33 192.168.24.33

4. To what le does the URL http://www.chipmunk.edu/seeds/sunower.html resolve? ( ) a. /var/www/html/seeds/sunflower.html ( ) b. /var/www/virtual/hamster.edu/seeds/sunflower.html ( ) c. /usr/share/seeds/sunflower.html ( ) d. /var/www/html/sunflower.html ( ) e. /var/www/virtual/chipmunk.edu/seeds/sunflower.html

5. To what le does the URL http://www.hamster.edu/seeds/sunower.html resolve? ( ) a. /var/www/virtual/hamster.edu/seeds/sunflower.html ( ) b. /var/www/html/sunflower.html ( ) c. /var/www/html/seeds/sunflower.html ( ) d. /var/www/virtual/chipmunk.edu/seeds/sunflower.html ( ) e. /usr/share/seeds/sunflower.html

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

64

Chapter 4. Virtual Hosts 6. To what le does the URL http://www.lemming.edu/seeds/sunower.html resolve? ( ) a. /var/www/virtual/chipmunk.edu/seeds/sunflower.html ( ) b. /var/www/html/seeds/sunflower.html ( ) c. /var/www/html/sunflower.html ( ) d. /var/www/virtual/hamster.edu/seeds/sunflower.html ( ) e. /usr/share/seeds/sunflower.html

7. To what le does the URL http://www.rat.edu/seeds/sunower.html resolve? ( ) a. /var/www/virtual/chipmunk.edu/seeds/sunflower.html ( ) b. /var/www/html/sunflower.html ( ) c. /usr/share/nuts/sunflower.html ( ) d. /var/www/virtual/hamster.edu/seeds/sunflower.html ( ) e. /var/www/html/seeds/sunflower.html

8. When accessing the URL http://www.gerbil.edu/seeds/acorns/, a 403 Access Denied error is generated. Assuming all lesystem ownerships, permissions, and SELinux contexts are correct, which of the following would allow access to the URL? ( ) a. Commenting out the Location directive from the appropriate container. ( ) b. Creating the le /var/www/virtual/gerbil.edu/seeds/acorns/nuts.html ( ) c. Creating the le /var/www/virtual/gerbil.edu/seeds/acorns/index.html ( ) d. Any of the above ( ) e. Either A or C

9. To what le(s) would information about the above (403 Access Denied) transaction be logged? ( ) a. /var/log/httpd/log/gerbil-error-log ( ) b. /var/log/httpd/log/gerbil-access-log ( ) c. /var/www/virtual/gerbil-edu/.hterrors ( ) d. A and B ( ) e. A and C

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

65

Chapter 4. Virtual Hosts 10. In the standard Red Hat Enterprise Linux conguration, which of the following les could also be used to provide virtual host conguration? ( ) a. /etc/httpd/gerbil.virtual ( ) b. /etc/httpd/conf.d/gerbil.conf ( ) c. /etc/httpd/conf.d/gerbil ( ) d. /var/www/html/.htgerbil ( ) e. B or C

rha230-5.0-1-en-2008-01-21T07:12:18-0500

66

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server


Discussion
Proxy Servers
A proxy server acts as a middleman between a client and a server. The use of a proxy server usually involves the following. Figure 5-1. The Role of a Proxy Server
Client Machine
8080 mozilla 192.168.0.254 squid 192.168.0.1 1.1.1.1

Proxy Server
80

Web Server
httpd 2.2.2.2

1. A client is congured to use the proxy server. This is a one time conguration, which usually requires the IP address and port of the proxy server. 2. When asked to connect to a service, instead of connecting directly, the client instead connects to the proxy server. 3. The proxy server accepts the request as if it were the server, but sends nothing back to the client immediately. Instead, the proxy server initiates the request to the real service, as if it were the client. 4. The true service receives the connection, and returns a response to the proxy server. 5. The proxy server then resends the response it received from the server to the client, as if it were the server. Why would anyone want to use such a convoluted scheme? The answer usually involves one of the following.

Access. The client may be on a machine that does not have a direct connection to the Internet, so it needs the services of a proxy server which does. In the scenario diagrammed above, the client is on a 192.168.0.0/24 private subnet, which by convention should not be routed directly to the Internet. Caching. The proxy server may store the response of the server, as well as returning it to the client. If the client (or another client) asks for the same information again, the proxy server merely needs to ask the real server "has your information changed?" If not, the proxy server can return the local copy, reducing trafc between the proxy server and the true service. Filtering. The proxy server becomes a single control point for all clients which it serves. Therefore, trafc can be ltered or logged for later auditing at the proxy server.

Although our gure diagrams a web proxy server, our discussion has been intentionally vague about what client and what service were talking about, because the idea of a proxy server is a general concept.

67

Chapter 5. The Squid Proxy Server The service in question could be a web server, an FTP server, or even an LDAP server, and the same concepts would apply.

The squid Proxy Server


Most often, if people use the term proxy server without elaboration, they are referring to a HTTP (web) proxy server. Red Hat Enterprise Linux ships with a full featured and sophisticated proxy server, know as Squid. Squid supports FTP, gopher, and HTTP requests, SSL encapsulation, robust caching, extensive access controls, and full transaction logging. Much like the Apache web server, a whole course could be devoted to deploying and maintaining the squid proxy server. Like most Red Hat Enterprise Linux packaged products, however, the out-of-the-box conguration makes it fairly easy to set up and use the proxy server in a basic conguration. We will cover how to install the server, dene which port it should bind to, and specify which clients are able to connect to the service. The proxy server is packaged in the squid package, and is managed as the squid service. Therefore, the standard techniques can be used for installing the software and starting the service in its default conguration.
[root@station ~]# yum install squid

... ============================================================================= Package Arch Version Repository Size ============================================================================= Installing: squid i386 7:2.6.STABLE6-3.el5 rha-rhel 1.2 M ... Installed: squid.i386 7:2.6.STABLE6-3.el5 Complete!
[root@station1 ~]# service squid start

init_cache_dir /var/spool/squid... Starting squid: .


[root@station1 ~]# chkconfig squid on

OK

The out-of-the-box conguration is not useful directly, however, as the default access control lists do not let any useful clients connect.

Squid Conguration: /etc/squid/squid.conf


Upon startup, the squid daemon reads the /etc/squid/squid.conf conguration le for its conguration. The conguration le follows a very traditional Linux (and Unix) syntax.

All white lines (lines which are empty or contain only white space) are ignored, as are all comment lines that begin with a "#". All other lines begin with a keyword, referred to as a "TAG". The syntax for arguments after the tag depend on the tag, but must all occur on the same line.

Like many Red Hat Enterprise Linux default conguration les, the le attempts to be self documenting and provides copious comments with default conguration values commented out. Usually, changing a

rha230-5.0-1-en-2008-01-21T07:12:18-0500

68

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server value to something other than the default involves uncommenting the default line, and changing its value (perhaps rst duplicating the line to preserve documentation of the default value). While the default conguration le is intimidating, weighing in at over 4300 lines, the relevant conguration is a mere 25 lines, as illustrated below.
[root@station ~]# wc /etc/squid/squid.conf

4325 25

24616 148129 /etc/squid/squid.conf 91 756

[root@station ~]# grep -v \# /etc/squid/squid.conf | sed "/^$/d" | wc

For our purposes, we are only going to examine three relevant tags: http_port, acl, and http_access.

The servers identity: http_port


Opening the /etc/squid/squid.conf conguration le with any text editor, you should be able to quickly nd the rst conguration tag, http_port. Figure 5-2. /etc/squid/squid.conf: http_port
# NETWORK OPTIONS 20 # ----------------------------------------------------------------------------# TAG: http_port # Usage: port # hostname:port 25 # 1.2.3.4:port # # The socket addresses where Squid will listen for HTTP client # requests. You may specify multiple socket addresses. # There are three forms: port alone, hostname with port, and 30 # IP address with port. If you specify a hostname or IP # address, Squid binds the socket to that specific # address. This replaces the old tcp_incoming_address # option. Most likely, you do not need to bind to a specific # address, so you can use the port number alone. ... 72 # Squid normally listens to port 3128 http_port 3128

By default, squid binds to port 3128, although by convention, HTTP proxy servers usually use the port 8000 or 8080. An administrator could well want to add a line akin to the following.
http_port 8080

Note that, as the comment says, multiple http_port lines can be added, causing squid to bind to more than one port or interface, if necessary.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

69

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server

Squid Access Control Lists: acl and http_access


More interestingly, we also explore manipulating the client access control conguration. Finding the access control conguration can be difcult, as the relevant conguration is found deep within the rather long le. Searching for the term acl, however, and pounding on Find Next about 10 times, you should be able to discover the following. Figure 5-3. /etc/squid/squid.conf: acl Documentation
# ACCESS CONTROLS # ----------------------------------------------------------------------------# TAG: acl 2230 # Defining an Access List # # acl aclname acltype string1 ... # acl aclname acltype "file" ... # 2235 # when using "file", the file should contain one item per line # # acltype is one of the types described below # # By default, regular expressions are CASE-SENSITIVE. To make 2240 # them case-insensitive, use the -i option. # # acl aclname src ip-address/netmask ... (clients IP address) # acl aclname src addr1-addr2/netmask ... (range of addresses) # acl aclname dst ip-address/netmask ... (URL hosts IP address) 2245 # acl aclname myip ip-address/netmask ... (local socket IP address) # ... 2255 # # acl aclname srcdomain .foo.com ... # reverse lookup, client IP # acl aclname dstdomain .foo.com ... # Destination server from URL

The acl tag assigns a name to a specication. The tag itself has no observable effect, but may instead be referenced by other tags (such as http_access, below). Skimming the comments here and in the le itself, we nd that acl specications can involve a wide range of parameters, including the following. Table 5-1. Squid acl Specications Keyword
src dst port myip srcdomain dstdomain time

Parameter Requesting clients IP address Real servers IP address Real servers port squids IP address Requesting clients domain name Real servers domain name Time of day

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

70

Chapter 5. The Squid Proxy Server Keyword


url_regex proto reqheader repheader

Parameter Regular Expression matched against the Requested URL Proxied protocol (HTTP, FTP, etc.) Regular Expression matched against HTTP request headers Regular Expression matched against HTTP response headers

And this is only some of the parameters that can be specied. Obviously, squid is highly congurable in terms of who it will let connect, and what content it is willing to proxy. We now turn our attention to the default conguration, which are the uncommented values found a few lines below. Figure 5-4. /etc/squid/squid.conf: acl
#Recommended minimum configuration: acl all src 0.0.0.0/0.0.0.0 2395 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl to_localhost dst 127.0.0.0/8 acl SSL_ports port 443 acl Safe_ports port 80 # http 2400 acl Safe_ports port 21 # ftp acl Safe_ports port 443 # https acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port 1025-65535 # unregistered ports 2405 acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 591 # filemaker acl Safe_ports port 777 # multiling http acl CONNECT method CONNECT

These lines dene the following names, which can be referred to later. Table 5-2. Default squid acl Denitions Name all manager localhost to_localhost Safe_ports CONNECT Members All requests squid internal cache management requests All requests originating from the loopback address All requests to the loopback address All requests to the well known ports of services squid is willing to proxy All requests to initiate an SSL encapsulated connection

As the Safe_ports acl illustrates, a name may be assigned multiple times, resulting in the values being "or"ed together (i.e., a match on any of the individual values is considered a match on the acl as a whole). Lastly, an access control policy is dened using multiple http_access tags which reference the acls dened above. On any client request, squid will use a "stop on rst match" policy while searching the following list of http_access controls. Order is important. Once squid nds a specication that

rha230-5.0-1-en-2008-01-21T07:12:18-0500

71

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server matches the client request, it stops searching and immediately implements the specied allow or deny policy. Figure 5-5. /etc/squid/squid.conf: http_access
# TAG: http_access # Allowing or Denying access based on defined access lists # # Access to the HTTP port: 2485 # http_access allow|deny [!]aclname ... # ... #Recommended minimum configuration: # # Only allow cachemgr access from localhost 2505 http_access allow manager localhost http_access deny manager # Deny requests to unknown ports http_access deny !Safe_ports ... # Example rule allowing access from your local networks. Adapt 2520 # to list your (internal) IP networks from where browsing should # be allowed #acl our_networks src 192.168.1.0/24 192.168.2.0/24 #http_access allow our_networks 2525 # And finally deny all other access to this proxy http_access allow localhost http_access deny all

To the experienced eye, the comments leave little more to add, but well walk through these lines just in case. The rst argument to the http_access tag is either the keyword allow or deny, followed by one or more acl names, each possibly preceded by a "!". The acl names are effectively "and"ed - all must apply to the client request for the http_access policy to apply. The presence of a "!" inverts the meaning of the acl. The rst line allows management requests, but only from the loopback address (i.e., from processes running on the proxy server). Notice that both the manager and localhost acls must apply for the policy to take effect. The second line denies management requests from all other sources. Any request for a port other than one for which squid is willing to proxy is denied. (Notice the convenient use of "!" to invert the meaning of the safe_ports acl.) This is where the good guys are dened. More on this in a second. Any requests from the loopback interface are considered good. Any request not meeting the above policies is prohibited by deny all.

Once we work our way through the default conguration, we realize that it only allows connections from the loopback address! If the proxy server is to be useful, the identities of the intended clients need to be specied. How should be evident from the comments. First, dene the our_networks acl to match

rha230-5.0-1-en-2008-01-21T07:12:18-0500

72

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server requests from the clients for whom squid should be willing to proxy. Second, add a http_access rule allowing connections that match the our_networks acl. (Of course, some name other than our_networks could have been used). Order is important. The matching rule should occur after requests for bad ports are ltered out, but before the deny all sledge hammer. For example, to allow clients to connect from the 192.168.0.0/24 subnet, we could add the following lines just beneath the our_networks comments.
acl our_networks src 192.168.0.0/255.255.255.0 http_access allow our_networks

Or, the equivalent IP subnet CIDR notation 192.168.0.0/24 could have been used. Of course, after modifying the conguration le, the squid service should be restarted.
[root@station ~]# service squid restart

Stopping squid: . Starting squid: .

[ [

OK OK

] ]

It took a while to understand why, but in the end, conguring squid to allow clients only involves a two line edit, both of which can be easily deduced from existing comments: one to dene who the good guys are, and another to modify the access control list chain to let them in.

Conguring Proxies for Web Clients


Once a proxy server is up and running, clients must be congured to use it. The details will vary from client to client, but the essence is the same. Somehow, the client needs to be congured with the IP address and port number of the proxy server.

Conguring Firefox
The refox web browsers proxy conguration is found by choosing the Connection Settings... button from Preferences Dialog, which is opened by choosing the Edit:Preferences menu item.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

73

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server Figure 5-6. Firefox Proxy Conguration

Once open, the dialog allows you to specify an independent proxy server for each of several protocols, or, conveniently, to set all protocols to use the same server. A list of domains and IP address for which the client should not proxy can also be specied, which is very useful for maintaining access to servers the proxy server might not be aware of (such as localhost or rha-server).

Conguring curl
Command line web clients are often congured to use proxy servers through command line switches or environment variables. Opening the curl man page, for example, and searching for proxy, one can (eventually) nd the following.
-x/- -proxy <proxyhost[:port]> Use specified HTTP proxy. If the port number is not specified, it is assumed at port 1080. This option overrides existing environment variables that sets proxy to use. If theres an environment variable setting a proxy, you can set proxy to "" to override it.

And, a little further down, the following.


ENVIRONMENT http_proxy [protocol://]<host>[:port] Sets proxy server to use for HTTP.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

74

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server


HTTPS_PROXY [protocol://]<host>[:port] Sets proxy server to use for HTTPS. ... NO_PROXY <comma-separated list of hosts%gt; list of host names that shouldnt go through any proxy. to a asterisk

If

set

For example, to download the Red Hat home page using a the proxy server dened above, either of the following techniques could be used.
[root@station ~]# curl -x http://station:8080 http://www.redhat.com [root@station ~]# export http_proxy=http://station:8080 [root@station ~]# curl http://www.redhat.com

Squid Logging: /var/log/squid/access.log


Like the Apache web server, squid maintains a transaction log, found at /var/log/squid/access.log. Squid uses its own log format, which displays details more pertinent to a proxy server than the standard common format used by web servers. The emulate_httpd_log directive can be set to use the traditional common format instead, though information will be lost. Table 5-3. Squid Log Format PositionExample 1 2 3 4 5 6 7 8-10 1124596159.068 60355 192.168.0.25 TCP_MISS/200 1381 GET http://www.redhat.com ... Content A Unix standard timestamp. a Request duration, in milliseconds. Client IP address Squid result code Number of bytes transferred to the client The request method The requested URL Parameters relevant to internal cache

Notes: a. The Unix world (including Linux) conventionally records timestamps internally using "seconds since the epoch", with the epoch being January 1st, 1970. Using a signed 32bit integer, this conveniently records times from around 1900 until around 2038. The Unix world was not concerned about "Y2K" problems, but instead worries about "Y2038" problems. Your author feels this would be the perfect time to come out of retirement and consult for legacy Linux systems. Three sample log messages are found below.
1124596032.120

2 192.168.1.1 TCP_DENIED/403 1355 GET http://www.redhat.com/ - NONE/- te

rha230-5.0-1-en-2008-01-21T07:12:18-0500

75

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server


1124596159.068 1124596167.650

60355 192.168.0.25 TCP_MISS/200 1381 GET http://www.redhat.com/ - DIRECT/209 1 192.168.0.25 TCP_HIT/200 12115 GET http://www.redhat.com/ - NONE/- tex

The rst is from a client which was not accepted by the client access control conguration, and so received a TCP_DENIED. The second is a request from a client for data not already in the cache, a TCP_MISS. The third is a followup request (perhaps from a reload of the same page), whose data was already cached locally, generating a TCP_HIT. Notice that the only request which took a signicant amount of time to fulll was the cache miss, which consumed around 60000 milliseconds of cache time, as opposed to 1 or 2.

Finding Out More


We have only touched upon a few of Squids basics. Those interested in more, such as using squid as a transparent proxy server (or "accelerator"), can consult the FAQs (which reads almost like a manual) found at /usr/share/doc/squid-version/FAQ-html, or consult the Squid home page (http://www.squid-cache.org/).

Exercises
Lab Exercise
Objective: Congure the Squid Proxy Server Estimated Time: 10 mins.

Specication
This lab will have you install, congure, and use the squid proxy server. A "real world" use of squid would require 3 machines: One to host the web server, one to host the proxy server, and of course the client machine running a web browser. Figure 5-7. Standard Squid Proxy Server Conguration
httpd 80 www.widgets.org 118.23.53.1 43523 firefox station1.example.com 192.168.0.1

8080 squid proxy.example.com 192.168.0.254 165.23.84.5

The machine hosting the web server would need a publicly accessible IP address, as would the proxy server. It could well be the case, however, that the client machine does not, with squid running on a

rha230-5.0-1-en-2008-01-21T07:12:18-0500

76

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server multi-homed host. The squid application would receive requests from client over a private IP address, and forward them to the Internet through its public IP address. For our lab, we will instead run the web server, the client, and the squid proxy server on the same machine. The concepts map directly to the real world scenario. In the following diagram, 192.168.0.1 should be replaced with your eth0 IP address. Figure 5-8. Lab Squid Proxy Server Conguration
student station httpd 127.1.1.2:80 squid 192.168.0.1:8080 firefox 127.1.1.2:35476

1. Congure the squid proxy server. a. Ensure that the squid package is installed. b. As a precaution, make a backup of the le /etc/squid/squid.conf, copying it to /etc/squid/squid.conf.orig, for example. c. In the le /etc/squid/squid.conf, search for the http_port option, around line 54. Set the http_port to 8080. d. In the le /etc/squid/squid.conf, search for term our_network, around line 1860(!). Administrators are expected to set local access control policies at this location. Following the commented out examples, dene an acl our_networks, which matches all requests sourced from your eth0 interface. For example, if ifcong eth0 reports your IP address as 192.168.0.5 and your network mask as 255.255.255.0, then the following line would be appropriate. (If in doubt, you can specify your IP address directly, with a mask of 255.255.255.255). Once dened, add a http_access directive which allows the acl.
acl our_networks src 192.168.0.0/255.255.255.0 http_access allow our_networks

e. Use the standard service and chkcong commands to start the squid service, and enable the service to start automatically on reboots. You might want to use the netstat command to conrm that squid is LISTENing for connections on port 8080.

2. Monitor squid and httpd requests. In two separate windows (or two separate virtual consoles), use less to open the les /var/log/httpd/access_log and /var/log/squid/access_log, respectively. Within less, hit SHIFT-F to enter "follow" mode. As new requests are made for each service, you should see a log line generated within the respective le. (Pressing CTRL-C will return less to normal browsing mode.) 3. Congure refox to use the proxy server. Using the refox browser, open the Edit: Preferences dialog, and follow the path to General and Connection Settings.... In the resulting dialog, choose

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

77

Chapter 5. The Squid Proxy Server Manual proxy conguration, and set the HTTP Proxy to be your eth0 IP address, port 8080. Also, remove any text from the No Proxy For text entry. OK your way out of the various dialogs. 4. Browse your webserver. Now use refox to browse the content of your webserver. If some of your previous labs are still in place, you may try http://localhost/relativity, http://www.peanutbutterisgood.rha, or http://www.jamisgood.rha. Otherwise, simply create a le in your document root directory, and reference it. With each request, you should see a line similar to the following in your /var/log/squid/access_log le.
1132416737.269 699 192.168.0.1 TCP_MISS/304 200 GET http://localhost/reading s/the_god_of_mars.html - DIRECT/127.0.0.1 -

If not, make sure you reload a page from within the browser. If the page is in the browsers cache, then it will not actually generate a request.

Deliverables

1. A running squid server, bound to port 8080, which allows requests over the IP address assigned to the eth0 interface. 2. The squid service is congured to start automatically upon reboot.

Challenge Exercises
1. Assuming your neighbors have set access control conguration appropriately, you should be able to use your proxy server to browse a neighbors website, or a neighbors proxy server to browse your website, or a neighbors proxy server to browse another neighbors website. Explore. 2. Congure your access control specications so that one particular neighbor may access your squid proxy server, but another may not. 3. Notice the following line in the /etc/squid/squid.conf conguration le.
2512 # We strongly recommend the following be uncommented to protect innocent # web applications running on the proxy server who think the only # one who can access services on "localhost" is a local user #http_access deny to_localhost

What concern is this addressing? In order to convince yourself that denying the to_localhost acl is a good idea, enable the /server-status location within your Apache web server, but take the precaution of only allowing requests from the loopback address 127.0.0.1. Then have a neighbor use your proxy server to access http://localhost/server-status from their machine. (Realize, of course, that xing this security hole by denying requests matching the to_localhost acl would break the original lab.)

rha230-5.0-1-en-2008-01-21T07:12:18-0500

78

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server

Questions
Use the /etc/squid/squid.conf excerpts below to answer the next 6 questions:
acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl public_terminal src 192.168.0.100/255.255.255.255 acl public_hours time M-F 09:00-17:00 acl intranet src 192.168.0.0/24 acl vpn src 10.0.1.0/24 acl media_files url_regex \.mp3$ acl media_files url_regex \.avi$ acl media_files url_regex \.mpeg$ acl media_files url_regex \.wma$ acl media_files url_regex \.wmv$ acl hostile dstdomain cracker.org acl to_localhost dst 127.0.0.0/8 acl SSL_ports port 443 563 acl Safe_ports port 80 acl Safe_ports port 21 acl Safe_ports port 443 563 acl Safe_ports port 70 acl Safe_ports port 210 acl Safe_ports port 1025-65535 acl Safe_ports port 280 acl Safe_ports port 488 acl Safe_ports port 591 acl Safe_ports port 777 acl CONNECT method CONNECT ... http_access allow manager localhost http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports http_access allow localhost http_access deny media_files http_access deny public_terminal !public_hours http_access allow intranet http_access deny all http_access allow vpn

1. What is the likely purpose of the "mediales" acl and associated http_access rule? ( ) a. To speed up access to music and video les by caching them ( ) b. To make it impossible to download music and video through the proxy ( ) c. To make downloading music and video through the proxy more difcult by blocking common le extensions. ( ) d. To stop external systems from retrieving audio and video les from internal systems ( ) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500

79

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 5. The Squid Proxy Server

2. What would happen if a request for http://www.somesite.com/files/hit_song.mp3 was sent to the proxy server from 127.0.0.1? ( ) a. Not enough information to tell ( ) b. Access would be granted ( ) c. Access would be denied, but other URLs might work ( ) d. Access would be denied unless the le was already cached ( ) e. Access would be denied for any destination

3. What would happen if a request for http://www.somesite.com/files/hit_song.mp3 was sent to the proxy server from 192.168.0.5? ( ) a. Not enough information to tell ( ) b. Access would be granted ( ) c. Access would be denied, but other URLs might work ( ) d. Access would be denied unless the le was already cached ( ) e. Access would be denied for any destination

4. What would happen if a request for http://www.somesite.com/files/hit_song.mp3 was sent to the proxy server from 209.132.177.60? ( ) a. Not enough information to tell ( ) b. Access would be granted ( ) c. Access would be denied, but other URLs might work ( ) d. Access would be denied unless the le was already cached ( ) e. Access would be denied for any destination

5. To what extent could the system with IP address 10.0.1.5 use this proxy server? ( ) a. It would be able to access any url ( ) b. It would be able to access some URLs ( ) c. It would not be able to use the proxy at all ( ) d. Not enough information to tell ( ) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

80

Chapter 5. The Squid Proxy Server 6. To what extent could the system with IP address 192.168.0.100 use this proxy server? ( ) a. It would be able to access any url ( ) b. It would be able to access some URLs ( ) c. It would not be able to use the proxy at all ( ) d. Not enough information to tell ( ) e. None of the above

7. What port does squid listen on by default? ( ) a. 8080 ( ) b. 443 ( ) c. 4400 ( ) d. 8139 ( ) e. None of the above

8. What is another common port for proxy servers to use? ( ) a. 8080 ( ) b. 443 ( ) c. 4400 ( ) d. 8139 ( ) e. None of the above

9. How does one congure proxy settings in the Firefox web browser? ( ) a. Tools:Proxies ( ) b. Edit:Preferences:Web Features:Proxies ( ) c. Tools:Settings:Connection Settings ( ) d. File:Use Proxy ( ) e. Edit:Preferences:General:Connection Settings Use the following excerpt from a squid log to answer the next question:
1137789430.068 50405 192.168.0.50 TCP_MISS/200 1381 GET http://academy.redhat.com/ - DIRECT/209.132

rha230-5.0-1-en-2008-01-21T07:12:18-0500 Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

81

Chapter 5. The Squid Proxy Server 10. What does this log message indicate? ( ) a. The client was denied access to the requested url ( ) b. The clients request is being "held" pending approval by an administrator ( ) c. The client was granted access to the url, which was found in cache ( ) d. The client was granted access to the url, which was not in cache and will be retrieved by the proxy ( ) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500

82

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Anda mungkin juga menyukai