This is a nice article from JavaScriptKid on the robots.txt file
There is a hidden, relentless force that permeates the web and its billions
of web pages and files, unbeknownst to the majority of us sentient beings. I'm
talking about search engine crawlers and robots here. Every day hundreds of them
go out and scour the web, whether it's Google trying to index the entire web, or
a spam bot collecting any email address it could find for less than honorable
intentions. As site owners, what little control we have over what robots are
allowed to do when they visit our sites exist in a magical little file called
"robots.txt."
"Robots.txt" is a regular text file that through its name, has special
meaning to the majority of "honorable" robots on the web. By defining a few
rules in this text file, you can instruct robots to not crawl and index
certain files, directories within your site, or at all. For example, you may
not want Google to crawl the /images directory of your site, as it's both
meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets
you tell Google just that.
http://www.javascriptkit.com/howto/robots.shtml

0 comments:
Post a Comment