My blog. Like it? Subscribe to my feed or check out the archives.

Canonical URLs with mod_rewrite 11 May 08

Consider the many ways you could arrive at my blog:

  • http://www.ericw.ca/blog
  • http://www.ericw.ca/blog/
  • http://ericw.ca/blog
  • http://ericw.ca/blog/

All of them are valid but only the last URL is canonical. The canonical URL is the "standard" or "authoritative" pointer to the content. But so what if there are multiple URLs that reference the same thing? Well, there are two issues:

  1. As noted by Jeff Attwood with wild abandon. It's simply bad software engineering.
  2. I'm hurting my PageRank. Google's PageRank is calculated per-URL, so I am potentially diluting the links to my blog by a factor of four! Other search engines work similarly, because theoretically each URL could refer to different content. So it's bad for search-engine optimization.

I use Apache's mod_rewrite module to rewrite and redirect incoming requests to the canonical URL. I thought I'd share the two rules I use to construct my canonical URLs.

Require no www

I redirect all requests to www.ericw.ca to ericw.ca.

RewriteCond %{HTTP_HOST} !^ericw\.ca [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/+(.*) http://ericw.ca/$1 [R=301,L]

The conditions match any requests where the hostname starts with anything other than "ericw.ca" (i.e., requests with subdomains). The rule then reconstructs the URL and uses a 301 Moved Permanently redirect to point to the correct location.

Require a trailing slash

I redirect all requests to "non-files" to contain a trailing slash.

RewriteCond %{REQUEST_URI} !/[^/]+\.[^/]+$
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^/(.*) http://ericw.ca/$1/ [R=301]

The first condition ignores requests for files: URLs matching, roughly, /.../file.extension; on these we do not want a slash. The second condition ignores requests that already contain a trailing slash.

Are your URLs canonical?

Post a comment