ION's Search Engine
Optimization Training: Instructions for creating a Robots.txt file
SECTION EIGHT - How to create a Robots.txt file
(or: 'How to boss your robots around.')
|
Robots, Spiders, crawlers, User-agents, whatever you
want to call them, they are your website's best friends. Actually, they are more like your
website's publishing agent. You want your spiders to see your site as often as possible so they can
bring any new content you've published to the eyes of the users who are searching for it. There is no
doubt that facilitating their job is in your best interest, and the Robots.txt file will allow you
to do just that, by telling all of your 8-legged visitors where to look across your whole domain.
Additionally, it also allows you to tell the Spiders to stay away from the pages you want to keep
hidden, or not to follow the links on it. (For whatever purpose you may have...) There is a META Tag
that does this task too, but it's not as comprehensive, and it isn't used by all Search Engines.
Clearly, the best 'spider guider' is a Robots.txt file that you make for your web site.
SEO terms you might want to review for this step:
How to make sure your spiders see exactly what you want them to:
- First we have to decide what kind of Robots.txt file you need. Do you have any content on your site
that you want to hide from them? Temporarily or permanently? Perhaps you have some pages still under
construction that you don't want them to see YET? Think for a moment about what exactly you want your
spiders to see, and which links you'd like them to follow.
- If you simply want your entire site open to all spiders like ours is, go ahead and
download our robots.txt file, which is a simple .TXT file that has nothing but the following 2 lines of
simple code inside it:
Copy this file to the root folder of your server, named "robots.txt" (without the quotes). In case
you don't know what I mean by your "root" folder, it's the file on your server that has your index.html
file in it as well... Sometimes servers call it the 'Public HTML' folder.
- If you DO have some pages or links to hide, or even just want to tell particular user-agents (Spiders)
not to index your site, then it gets more difficult, so we have to add more to the code above. Instead of us
teaching you the whole process here, however, it is much easier to use the handy, free tool below from Submit
Corner that will custom build the file for you instead:
Again, once you have your finished text file, save it to the root or 'public html' folder of your web
server, in the same directory as your index.html file.
- Your spiders will now be forced to go where you want them to. Need to make sure it works like you
designed it? Try SearchEngineWorld's "Robots.txt file validator" tool to verify that it conforms to all Search
Engine Standards. (Don't forget to type in the whole URL including the file name!)
Why spider guiding is relevant to SEO
Many people who are still building parts of their
website use the Robots.txt file to HIDE their new pages until they are complete. (By instructing the
spiders to stay away from that directory or file, and then changing the file back to normal
afterwards.) I've also heard of people who create a links directory for link-swapping purposes, (We'll
tackle that in step #10) and then put a line in their Robots.txt file that says NOT to follow the
links from that page. -Don't get caught doing this! People who do so are not only cheating the
people who they swapped links with, but are likely to be caught by the Search Engines themselves...
It's dishonest, and Search Engines don't like it. (It makes their reporting harder.) In essence, such
people are promising to "Vote for" the popularity of another website, and then they turn around behind the
search engines' backs and recall their votes before they can help anyone.
If you have traded links with someone and they do
this to the page your link is on, Search Engines won't see your 'vote,' and so they have in
effect cheated you. However, don't just assume that they've done it on purpose to recall their votes
though, you can check first by viewing their Robots.txt file. Just open a browser and type:
Http://www.theirdomain.com/robots.txt <---Just like yours.
(Replacing "theirdomain.com" with their
actual domain, of course. If you see that they've instructed their spiders to leave out your link's
page on their site, then perhaps a polite reminder by way of email would be in order. If they still
don't link to you after that, however, simply remove your link to them. You can do better. Someone who
uses this tactic obviously knows what they are doing and how unethical it is.
In general, however, many people find that spiders
just aren't "Smart enough" to find all of their site's contents on its' own. For that reason, it makes
sense to tell any visiting spiders to index all pages, and follow all links. It "gives them no excuse" to
miss valuable links and pages.
Got your robot roadsigns in place? Great. Now it's time to submit your pride and joy to the Search Engines
and Directories themselves! (Like ringing a web-spinner dinner bell!) Our next step will show you how to
do it effectively, thoroughly, and not risk over-submitting. And of course, all for free, once again.
Click here to continue on to Section Nine.
This page and all contents are the property of ION. Although we grant permission for reprinting this
page, we do not grant permission for mass distribution. If you would like to make a suggestion or point out
a possible mistake for correction, please send an email to us at:
Library@InternetOptimization.net, and we will be
happy to consider your comments.
Click the button below to get this page formatted for your printer.
© Copyright 2004-2008 Internet Optimization Network - All Rights Reserved.
|