FuRom
09-25-2007, 03:07 AM
Have you ever wondered how website's codes can be organized when all their files in your browser's address are always like "topic-2084.html" or something that appears to have been personally created? Well, I have been too! I knew it was something and simple, probably to do with .htaccess, but I never really looked into until today when OwnManAtt was looking for a full tutorial on it. This is not a full tutorial on mod rewrite, but it will help you to accomplish helping you make your website easy to archive through any spider, at least I would assume this would help spider crawling, if I'm wrong, please correct me.
You might be asking "what's a spider?" The answer is simple. A spider is a search engine's bot. Search engines like google have bots that browse your website and archive information to make it more accessible to the public. You might as "Do I really want a bot on my site?" Commonly the answer would be no, but this is a good bot. Not a bad bot. It'll help your site grow.
I'm going to show you my home brewed version and explain it a little bit. This is not a newbie's guide. It is in no way newb-friendly, nor do I intend to make it beginner friendly. This is specifically for those who have achieved enlightenment in the art of web programming. This is also only for apache web servers that have the mod rewrite enabled. You can google how to enable it if you're running your own personal server, if you're on cpanel hosting, I don't know what to tell ya, just test it.
Now, this is my home brew, I'll explain a little about it after you've copied it. I'm not going into detail, since you should be experienced with this type of stuff, I'll just say how I handle it personally, you might get some ideas.
Filename: .htaccess
File Location: folderloc/
RewriteEngine On
RewriteBase /
RewriteRule ^([^*/]+)\/(.*).html?(.*)$ folderloc/$1.php?idx=$2 [L]
RewriteRule ^([^*/]+)\.html?(.*)$ folderloc/$1.php [L]
Filename: index.php
File Location: folderloc/
<?php
// This is important! It is my way of handling variables.
// You can't just put ?blah=def at the end with this and
// I'm too lazy to figure out how when I can just do this
$_GET = explode("/~", $_GET['idx']);
// This is how I display information. I got magic quotes
// on... not like it's important for you to know though...
echo $_GET['0'].'<br>';
echo $_GET['1'].'<br>';
echo $_GET['2'].'<br>';
echo $_GET['3'].'<br>';
echo $_GET['4'].'<br>';
echo $_GET['5'].'<br>';
?>
The rewrite module uses a language known as "regex" or regular expressions. It's highly advanced and a pain in the neck to understand, therefore I call it a language in it's own.
Now, you'll notice I have two rewrite rules going on here:
RewriteRule ^([^*/]+)\/(.*).html?(.*)$ folderloc/$1.php?idx=$2 [L]
RewriteRule ^([^*/]+)\.html?(.*)$ folderloc/$1.php [L]
on is to handle a url like "blah.com/folderloc/filename/~var/~var2/~var3.html" and the other is just for something like "blah.com/folderloc/filename.html". One will handle $_GET method variables and the other just gives you the file.
The one that handles variables ["blah.com/folderloc/filename/var/~var2/~var3.html"], allows you to get the variable into your page with the $_GET array like: $_GET[0], $_GET[1], and so on. I handle it like this because of the simple fact it doesn't let you use "?var1=blah" and such at the end of the url and I'm just too lazy to figure out how to handle it. It works in the order that the variable is put in the url. You can access "index.php" by something like: "blah.com/index.html" or "blah.com/index/var/~var2/~var3.html".
Well, good luck in understanding this. I really did want to write a full explainatory tutorial on this, but this is just one of those things that you really have to learn other things to really understand it. I can try to help you understand it, but I'm not going to give you step by step instructions on how to make this specific script work, nor any other. It's just not that productive for me and it doesn't help you learn.
More technical information about mod rewrite can be found here:
http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html
You might be asking "what's a spider?" The answer is simple. A spider is a search engine's bot. Search engines like google have bots that browse your website and archive information to make it more accessible to the public. You might as "Do I really want a bot on my site?" Commonly the answer would be no, but this is a good bot. Not a bad bot. It'll help your site grow.
I'm going to show you my home brewed version and explain it a little bit. This is not a newbie's guide. It is in no way newb-friendly, nor do I intend to make it beginner friendly. This is specifically for those who have achieved enlightenment in the art of web programming. This is also only for apache web servers that have the mod rewrite enabled. You can google how to enable it if you're running your own personal server, if you're on cpanel hosting, I don't know what to tell ya, just test it.
Now, this is my home brew, I'll explain a little about it after you've copied it. I'm not going into detail, since you should be experienced with this type of stuff, I'll just say how I handle it personally, you might get some ideas.
Filename: .htaccess
File Location: folderloc/
RewriteEngine On
RewriteBase /
RewriteRule ^([^*/]+)\/(.*).html?(.*)$ folderloc/$1.php?idx=$2 [L]
RewriteRule ^([^*/]+)\.html?(.*)$ folderloc/$1.php [L]
Filename: index.php
File Location: folderloc/
<?php
// This is important! It is my way of handling variables.
// You can't just put ?blah=def at the end with this and
// I'm too lazy to figure out how when I can just do this
$_GET = explode("/~", $_GET['idx']);
// This is how I display information. I got magic quotes
// on... not like it's important for you to know though...
echo $_GET['0'].'<br>';
echo $_GET['1'].'<br>';
echo $_GET['2'].'<br>';
echo $_GET['3'].'<br>';
echo $_GET['4'].'<br>';
echo $_GET['5'].'<br>';
?>
The rewrite module uses a language known as "regex" or regular expressions. It's highly advanced and a pain in the neck to understand, therefore I call it a language in it's own.
Now, you'll notice I have two rewrite rules going on here:
RewriteRule ^([^*/]+)\/(.*).html?(.*)$ folderloc/$1.php?idx=$2 [L]
RewriteRule ^([^*/]+)\.html?(.*)$ folderloc/$1.php [L]
on is to handle a url like "blah.com/folderloc/filename/~var/~var2/~var3.html" and the other is just for something like "blah.com/folderloc/filename.html". One will handle $_GET method variables and the other just gives you the file.
The one that handles variables ["blah.com/folderloc/filename/var/~var2/~var3.html"], allows you to get the variable into your page with the $_GET array like: $_GET[0], $_GET[1], and so on. I handle it like this because of the simple fact it doesn't let you use "?var1=blah" and such at the end of the url and I'm just too lazy to figure out how to handle it. It works in the order that the variable is put in the url. You can access "index.php" by something like: "blah.com/index.html" or "blah.com/index/var/~var2/~var3.html".
Well, good luck in understanding this. I really did want to write a full explainatory tutorial on this, but this is just one of those things that you really have to learn other things to really understand it. I can try to help you understand it, but I'm not going to give you step by step instructions on how to make this specific script work, nor any other. It's just not that productive for me and it doesn't help you learn.
More technical information about mod rewrite can be found here:
http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html