Robots.Txt – The Often Forgotten File
This is one of those files that almost any web bot will look for on your server. If it is there, it can be useful. If it isn’t, your log file will be filled with 404 errors. So, what is it? Well, it is literally a file on your server called robots.txt. It is a plain text file that is placed into the root directory of your website. When a search engine crawls your site, it will look for the robots.txt file for instructions. What’s it good for?
The file is used to control what search engine bots do when they come to your site. One of the common uses is to block a bot from crawling certain directories on the website, such as an images folder or a folder containing certain scripts that you may not want indexed. The file must contain correct syntax otherwise it could potentially adversely affect the way that bot interacts with your site. So, for example, to block a particular bot from your site, you use the syntax:
User-agent: *
Disallow: /
So, for example, let’s say you want to keep Google’s Image Search from indexing your site’s images. You would use the following in your robots.txt file:
User-agent: Googlebot-Image
Disallow: /
You can view a list of all search bots here or here to help you know what to specify for User-Agent. To see a list of unsafe bots, check out this list of 135. To disallow ALL bots from certain folders on your site, do something like the following:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/
If, for some reason, you wanted to ban bots from your whole site, you would use:
Disallow /
You can’t use a wildcard in the Disallow field. Wildcards can be used in the User-Agent field, but not the other.
Do you need a robots.txt file? No, but like I said, not having one will lead to a bunch of 404 errors for it in your server log file. That said, your site will function just fine without one. Without a robots.txt file, a search bot will simply assume that it is OK to crawl everything on your site.
The other rules to keep in mind are:
- One command per line (you can’t stack user-agents or disallows on a single line)
- Only one robots.txt per domain, located in the site’s root
- File must be called robots.txt (all lowercase)
Don’t want to mess around with manually creating this file (as if its that hard), you can use a variety of free online tools to create one for you, such as:
- Yellowpipe robots.txt Generator
- Advanced Robots.txt Generator (a paid program)
Why would you want one?
- Block unwanted bots, like image search
- You can direct certain bots to certain content. For example, you might want to control who crawls foreign language content, or bots from specialized engines can be directed to certain targeted content.
- You can prevent unwanted bots from overworking your server
Remember to validate your robots.txt here.
Hopefully that takes any potential mystery out of this little file. I recommend putting up one if you haven’t already. Even if you simply put it up there to allow all bots, at least its presence will spare your log files from all the 404 errors.
If you enjoyed this article, you might also like...
- Yahoo Slurp Spider Drives Forum Server Load Through the Roof!
- The Prevention of SPAM on Your Website
- Tactic For Building Your Swipe File [#16]
- A Look at AJAX
- Setting up a Gallery with Wordpress
Get your copy of the Six Figure Blogger Blueprint.
You'll Discover...
- 3 vital questions to pick the right niche.
- The layout for funneling traffic into your blog.
- How this blog generates thousands of dollars monthly - without a single banner ad.

I'm David Risley. I've been making my living as a blogger for over a decade. Blogging is my business and how I support my family. With this blog, I'm just gettin' REAL and telling you how this business works.









