I ran into a situation where I was asked if there was a way to redirect a referrer in Apache to a different relative location. At first, I thought it’d be easy because you’d only need to use Regular Expressions (regex) to match referrer, specify the relative location, and you’re done. This Techbit covers how it’s not as simple as it seems.

Anatomy of an Apache Rewrite

I’ve honorably mentioned Apache in my CentOS Server and CentOS 7 AMI Web and Mail Server with DDNS projects.

For more details, Apache is the #1 software serving the web. It launched in 1995 and has been the Internet’s most popular web server since April 1996!

It’s secure and extensible but also quite resource-intensive. RedHat calls the Apache binary httpd, while Debian, confusingly, calls it apache2.

One of the extensible features of Apache is rewrites using the mod_rewrite module. It provides a method of URL manipulation using the Perl Compatible Regular Expression vocabulary.

Apache 2.4 includes new functionality, like per-module and per-directory LogLevel configuration, which completely replace the RewriteLog and RewriteLogLevel directives. Furthermore, new logging levels trace1 to trace8 were added above the debug log level, which we’ll make use of here.

A RewriteRule has three arguments separated by spaces:

  1. Pattern: regex matching incoming URLs to modify.
  2. Substitution: where to send matching requests.
  3. [flags]: options for the rewritten request.

For example, for a folder moving locations, an Apache Rewrite can redirect URLs from the old folder to the new one:

RewriteRule ^(.*)/old_folder/(.*)?$ $1/new_folder$2 [L]

In the above case, each set of parenthesis in the Pattern is a captured group assigned to a numerical variable starting with $1 that we can use in the Substitution.

Note

Using RewriteBase would simplify and secure rewrites by allowing relative paths for substitutions, but it’s not used by default.

Well, that sounds great, but what does it look like?

Ubuntu Server Quickstart

For demonstration’s sake, I recommend creating an Ubuntu Server 22.04 LTS virtual machine in VirtualBox. I’ve mentioned setting up Ubuntu Desktop in VirtualBox in my ClashCallerBot project, but the steps for Ubuntu Server are similar. If unsure, the default VirtualBox values should work.

Tip

With the Network Address Translation (NAT) network type, use VirtualBox port forwarding to forward requests from the Host to the Guest. The default 127.0.0.1 IP will work for the Host, but the Guest’s IP with NAT is 10.0.2.15:

VirtualBox Port Forwarding Rules window with http, SSH, and MySQL ports forwarded.

This way, going to 127.0.0.1 in the Host’s browser will go to the Guest’s Apache Server.

As I mentioned in my CentOS 7 AMI Web and Mail Server with DDNS project, I prefer PuTTY/KiTTY when connecting from Windows operating systems. With this configuration, we only need to input 127.0.0.1 in the Host Name (or IP address) field.

After Ubuntu Server is installed in VirtualBox, installing apache2 is straightforward:

  1. Install:

    sudo apt install apache2
    
  2. Enable the mod_rewrite module and restart:

    sudo a2enmod rewrite
    sudo systemctl restart apache2.service
    

We can add rewrites by editing the main configuration file called apache2.conf:

sudo vim /etc/apache2/apache2.conf

Put any rewrites in the relevant directory:

<Directory /var/www/>
      Options Indexes FollowSymLinks
      AllowOverride None
      Require all granted
      RewriteEngine On
      RewriteRule ^(.*)/old_folder/(.*)?$ $1/new_folder$2 [L]
      LogLevel alert rewrite:trace3
</Directory>

Note

The LogLevel directive defines the log granularity. We must use the new trace1 to trace8 logging levels because mod_rewrite’s logging doesn’t log any actions up to the debug level.

Tip

We can tail the log output file with the following:

tail -f /var/log/apache2/error.log | fgrep '[rewrite:'

In this case, the -f flag keeps tail running to “follow” the end of the file. Use CTL + C to exit.

The output looks something like this, which demonstrates the default Host IP:

[Wed Feb 28 06:39:03.340344 2024] [rewrite:trace3] [pid 4094:tid 139976146650688] mod_rewrite.c(480): [client 10.0.2.2:56954] 10.0.2.2 - - [127.0.0.1/sid#7f4ec7032d20][rid#7f4ec47010a0/subreq] [perdir /var/www/] applying pattern '^(.*)/old_folder/(.*)?$' to uri 'html/new_folder/index.html'

Tip

Deleting the error.log file would change its permissions when we recreate it, so use truncate to clear it:

sudo truncate -s 0 /var/log/apache2/error.log

Restart apache2 to apply the rewrites:

sudo systemctl restart apache2.service

With the above, if we navigate to http://127.0.0.1/old_folder in the Host’s browser, Apache will redirect us to http://127.0.0.1/new_folder:

Chrome browser with Network tab displaying a 302 redirect from old_folder to new_folder.

Note

To test the above rewrites, you’ll have to manually create each example folder under /var/www/html and put an index.html file in the final directory. Otherwise, Apache will output errors that the location doesn’t exist. A plain index.html file like this would work:

<!DOCTYPE html>
<html lang="en">
<head>
<title>Old Folder</title>
</head>
<body>
<p>This is the old folder</p>
</body>
</html>

Now that we have a test bench, how easy is it to rewrite a referrer parameter?

RewriteRule Cannot Parse a Referrer

The first part of cracking this puzzle is understanding RewriteRule‘s limitations. Let’s say we want to redirect this URL with a referrer parameter back to the web document root:

http://127.0.0.1/path/to/something?p1=param1&p2=param2&referrer=google.com

Based on the previous example, we’d expect something like this to work:

RewriteRule ^.*/path/to/something.*&referrer=.*$ / [R=301,L]

Tip

Use an external regular expression tool for testing.

In this case, R=301 indicates the rewrite should use a permanent redirect, but there isn’t one:

Chrome browser displaying the URL with a referrer and no redirect.

Our logs confirm it’s because it’s not evaluating against any of the URL’s parameters:

[Thu Feb 29 02:52:34.336668 2024] [rewrite:trace3] [pid 1136:tid 139813843879488] mod_rewrite.c(480): [client 10.0.2.2:62553] 10.0.2.2 - - [127.0.0.1/sid#7f28fa582d20][rid#7f28f84520a0/initial] [perdir /var/www/] strip per-dir prefix: /var/www/html/path/to/something/ -> html/path/to/something/
[Thu Feb 29 02:52:34.336762 2024] [rewrite:trace3] [pid 1136:tid 139813843879488] mod_rewrite.c(480): [client 10.0.2.2:62553] 10.0.2.2 - - [127.0.0.1/sid#7f28fa582d20][rid#7f28f84520a0/initial] [perdir /var/www/] applying pattern '^.*/path/to/something.*&referrer=.*$' to uri 'html/path/to/something/'
[Thu Feb 29 02:52:34.336785 2024] [rewrite:trace1] [pid 1136:tid 139813843879488] mod_rewrite.c(480): [client 10.0.2.2:62553] 10.0.2.2 - - [127.0.0.1/sid#7f28fa582d20][rid#7f28f84520a0/initial] [perdir /var/www/] pass through /var/www/html/path/to/something/
[Thu Feb 29 02:52:34.343440 2024] [rewrite:trace1] [pid 1136:tid 139813843879488] mod_rewrite.c(480): [client 10.0.2.2:62553] 10.0.2.2 - - [127.0.0.1/sid#7f28fa582d20][rid#7f28f844e0a0/subreq] [perdir /var/www/] pass through /var/www/html/path/to/something/index.html

So, how do we get it to match parameters?

RewriteCond Parses Referrer

As is often the case, the RewriteRule Directive documentation provides the solution:

What is matched?

  • If you wish to match against the hostname, port, or query string, use a RewriteCond with the %{HTTP_HOST}, %{SERVER_PORT}, or %{QUERY_STRING} variables respectively.

It’s not explicitly mentioned in the Apache docs, but the query string is any part of the URL that assigns values to parameters or the portions using the ? or & delimeters. For example, the p1=param1 of our original URL is part of the query string.

That’s why RewriteRule cannot match the referrer parameter: parameters are part of the query string, which is only accessible by RewriteCond.

So, something like this should work because the RewriteCond matches the referrer parameter while the RewriteRule matches the URL path:

RewriteCond %{QUERY_STRING} ^.*&referrer=.*$
RewriteRule ^.*/path/to/something.*$ /? [R=301,L]

Note

In this case, we’re using ? in the /? Substitution to discard the query string.

The Host’s browser’s network tab can confirm there is a 301 redirect to the web document root:

Chrome browser displaying a 301 redirect to web document root without the referrer parameter.

Conclusions

As noted in the mod_rewrite docs, it’s a powerful and complex tool with edge-case scenarios. This is just the tip of the rewrite iceberg, but turning on rewrite logs and checking documentation helps identify how to navigate around it.