Skip to main content

Understanding how to configure Apache properly

For the most part, there are two important things we need to consider when configuring Apache, but first a basic overview of how configuring Apache works.

Apache configuration

Apache uses two types of "configuration files": the main httpd.conf file, which is in the Apache /conf directory, and individual .htaccess files, which sit in individual folders on your website and tell Apache specific behaviours it should use when serving pages from that directory.

Before we go any further:

Do not use .htaccess files

Everything you want to configure Apache to do, you can do within the main httpd.conf file; .htaccess files are not needed unless you do not have access to change the Apache configuration (on a shared hosting setup, for example). 

There are important reasons you should not use .htaccess files:

  • They are bad for performance. In order to utilise .htaccess files, Apache must check the entire folder hierarchy for .htaccess files that might override the main configuration before serving a page, which isn't efficient at all.

  • They introduce security vulnerabilities. Often, .htaccess files are used to deny access to particular folders in a website with sensitive information. The problem with this is that you have to constantly remember to make sure all the .htaccess files are correct; if sensitive files are moved from one place to another, you need to update the .htaccess file in the new directory, or you have a security breach. Using .htaccess files in this way is a sign of bad security practices.

Understanding the httpd.conf file

Explaining every feature of it would take forever (and be unnecessary), so this focuses on some important things you should know about.

The DocumentRoot directive

This tells Apache what the "root folder" of your publicly visible website is. It is therefore a security best practice that source code, credentials and any other information you never want to be publicly visible is stored outside of the DocumentRoot, so there is never a risk of Apache inadvertently displaying it to someone.

The <Directory> tag

Within these tags, you define the behavior Apache should use for a given directory (you can define one for the DocumentRoot, and one normally is by default). Within the directory tag, there are some important directives to know about.

AllowOverride (None or All)

This tells Apache whether .htaccess files inside your DocumentRoot folder can override the normal behaviour. When set to None, Apache will not look for .htaccess files and will not honour any directives contained in them. It is best practice to have this set to None unless you are very sure what you are doing.

Options (Indexes, FollowSymLinks)

This has a few possible configurations, and tells Apache what to do in certain situations.

The Indexes keyword tells Apache whether a user should be shown a list of folder contents when browsing to a URL that points to a folder (e.g. yourwebsite.com/folder/). Unless you specifically want it to do this, you should usually use Options -Indexes, which tells Apache not to show indexes and instead return a 403 Forbidden error.

Potential pitfall: if you specify a or + sign next to the Indexes or FollowSymLinks parameters, then you must specify a sign for both, or your configuration will be invalid.

# This line will cause Apache to fail. Either omit "FollowSymLinks",
# or add a - or + sign in front of it.
Options -Indexes FollowSymLinks
DirectorySlash (On/Off)

This tells Apache if you want it to add a trailing slash to requests that point to a folder, but don't have a trailing slash in them. For instance, say a user requests "yourwebsite.com/myfolder", and there is a folder "myfolder" in your website at that location; when DirectorySlash is on, Apache will rewrite this request to "yourwebsite.com/myfolder/". 

Whether or not you want this behaviour is dependent on your use case. It is On by default.

The mod_rewrite engine

Apache's mod_rewrite engine is extremely powerful, and also extremely difficult to get to grips with, as the syntax and commands can be quite confusing to get your head around and debug.

The command below enables the mod_rewrite engine, often for a particular directory:

RewriteEngine on

The rewrite engine lets you make your website easier to use and nicer to look at. For example, you can remove the file extension from a URL (ever been to a website and seen ".html" or ".php" at the end of every URL? Apache rewrite helps get rid of that). It does much more, but getting it working correctly can be very difficult.

The primary way in which you use the rewrite engine is by creating a series of statements, each of which begins with one or more RewriteCond lines (conditions), followed by a RewriteRule line (something to do if the conditions are met). Here is an example, that removes the ".php" extension from URLs:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\ (.*)\.php [NC]
RewriteRule ^ %1 [R=301,L]

To a non-tech person this looks like some forgotten horror language, and even to tech people, this can look very confusing without some proper explanation. Here is a breakdown of what this condition and rule do.

THE_REQUEST

This is a special keyword; THE_REQUEST refers to the request as it was sent by the user. This is very important to note in the context of the related keyword, REQUEST_URI, which refers to the request URL including any rewrites by the rewrite engine.

REQUEST_URI and THE_REQUEST can have very different values!

^[A-Z]{3,}\ (.*)\.php

Apache's mod_rewrite engine uses Perl compatible regular expressions. This one says the following:

"If a request begins with 3 or more letters, then any other set of characters, and is then followed by the string ".php", then this condition is met."

[NC]

This directive at the end of the line means that this condition is not case sensitive.

RewriteRule ^ %1

This says that, when the condition is met, the request should be rewritten by matching any request (the ^ operator = start of request) and rewriting the request to %1, which is a keyword meaning the first matched regular expression in the RewriteCond statements. 

[R=301,L]

This is actually two statements in one. The first one, R=301, tells Apache that it should redirect the user's browser to the new rewritten URL (normally, apache rewrites are internal, and are not seen in the user's browser). This means the new rewritten URL will appear in the user's browser.

The second statement, L, says that if this rule was used (i.e. the conditions were matched), that Apache should stop evaluating subsequent rules and end rewriting here. 

The new URL the user was sent to will be subject to the rewrite engine as well. A common pitfall is to end up with a "Too Many Redirects" error if your Apache setup is incorrect.