Is your database-driven website indexed by every search engines? Are your URLs impossible to remember for your visitors because they look like this:
http://www.mydomain.com/
articles.php?category=39&article=27
This is a problem with large websites as they are database-driven and parameters have to be sent to a script for processing. Search engines are much better at indexing static pages. Even though Google doesn’t seem to be afraid of dynamic pages, other search engines such as Yahoo! and Live will probably index just a few pages as they don’t want to be caught in a loop.
Fortunately, there are at least three solutions to this: using PATH INFO, ForceType and mod_rewrite.
There’s another method which involves 404 pages but I wouldn’t recommend it since you won’t be able to submit your sitemap to Google. When you add your website to Google’s Webmaster Tools, 404 pages MUST return a 404 error code but when you use that method to write friendly URLs, you change that 404 code to 200 (which means the page exists even if it doesn’t).
Friendly URLs Using PATH_INFO
This is probably the easiest method although it’s not my favorite (I’ll explain why later). Suppose you have this URL :
http://www.mydomain.com/articles.php?category=39&article=27
Using PATH INFO will give you a URL similar to this:
http://www.mydomain.com/articles.php/39/27
When Apache receives a request for this URL, it will first look for a file or folder named “27″. As it doesn’t exist, it will look for a file or folder named “39″. As it doesn’t exist neither, it will look for a file named “article.php”, which exists, and will call that script. Upon execution of that script, Apache will store the path articles.php/39/27 into a global variable called $PATH_INFO.
At this point, what you want to do is split that variable into an array like this :
<? $myvars = explode(’/', $PATH_INFO); ?>
This way, you will be able to access each variables like this :
<?
echo myvars[0]; // returns “articles.php”
echo myvars[1]; // returns 39
echo myvars[2]; // returns 27?>
Now simply rename your variables to what your PHP script expects to process and you’re done.
If the $PATH_INFO variable is empty, this may be because your need to turn on this feature in your .htaccess file:
AcceptPathInfo On
To do this, you must also ensure that your web hosting provider allows you to override Apache’s main configuration.
The reason why this is not my favorite method is because the URL still needs to call the script and it doesn’t look as nice as a static page URL:
http://www.mydomain.com/articles.php/39/27
Friendly URLs using ForceType
This method allows you to hide your script’s extension (.php). To make a long story short, Apache is configured to parse only .php files as PHP (and maybe .php3, .php4 also). Basically, what you’re going to do is tell Apache to parse files with no extensions as PHP through your .htaccess file :
<Files articles>
ForceType application/x-httpd-php
</Files>
This way you’re going to be able to write URLs like this:
All that’s left to do is to get the rest of the path through the $PATH_INFO like you’ve seen in the previous method.
Friendly URLs with mod_rewrite
This is my favorite method although it’s probably the most complex one. Mod_rewrite is an Apache module which lets you rewrite URLs through regular expressions in an .htaccess file. In my opinion, this is the most powerful and flexible way to use friendly URLs but novices might not be comfortable with this one.
In order to use mod_rewrite, you must make sure that your web hosting provider has enabled this feature in Apache’s configuration file.
So let’s pretend we want to transform this dynamic URL into a friendly URL:
http://www.mydomain.com/articles.php?category=39&article=27
The first thing to do would be to add a field to both our ”article” and “category” table in our database. That new field will contain a “friendly” identifier for each record and must be unique. So instead of identifying our article by ID #27 and the category by ID #39, we would identify them by more friendly terms so that our URL could look something like:
http://www.mydomain.com/articles/tutorials/dreamweaver.html
So here our category is identified by “tutorials” and “dreamweaver” would be the article identifier. Got it? Great.
So now we need to rewrite the above URL so that we can pass our identifiers to our PHP script.
Here’s what our .htaccess file may look like:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^articles/(.*)/(.*).html$ articles.php?categ=$1&article=$2 [L]
</IfModule>
Let’s break down these instructions:
- IfModule mod_rewrite.c : this makes sure that the mod_rewrite module is enabled.
- RewriteEngine On : turns on the rewrite engine (duh!).
- RewriteBase : explicitly sets the base URL for per-directory rewrites.
The RewriteRule line is the most crucial one so here’s a detailed break down:
- ^ : this indicates the start of line.
- articles : means that the URL we want to rewrite MUST START by “articles/” (ie.: http://www.mydomain.com/articles/tutorials/dreamweaver.html)
- (.*)/ : this our first variable. Everything that comes before the first slash is put into a variable named $1. That means our category identifier will be stored into $1.
- (.*).html : this is our second variable. Everything until “.html” will be stored into another variable named $2. Our article identifier will be stored into $2.
- $ : this means that nothing must come after “.html” or else the URL will not be processed.
- articles.php?categ=$1&article=$2 : this is where we call our PHP script. Both $1 and $2 variable are going to be replaced by their respective values (see above).
- [L] : the [L] instruction makes sure that the rewrite engine will stop processing the URL after this statement. [L] means “Last”.
Now all that’s left to do is to query your database according to values you passed to it through URL rewriting.
There is a lot more to mod_rewrite and regular expressions so have a look at the following links:
And of course, you can also ask me your questions.







January 21st, 2008 at 10:35 am
[...] than the one they came to. This could be for various reasons: a moved file, protecting your links, setting up friendly URLs, [...]