Are you a Google Analytics enthusiast?
More SEO Content
Web Stats Show Lots Of 404 Errors
Posted 30 January 2007 - 02:42 PM
"All of the 404 errors are for the robots.txt file. This is a special file that you can create for your account to give instructions to the crawling search engine bots. When a bot crawls your page it first checks if you have a robots.txt file. Since you do not have such a file it generates a 404 error message and this is completely normal. Please note that this is not a problem at all. The web site is indexed correctly by the search engine bots. They do not need it to be present."
Could this mean the bots are only allowed access to my home page? Will they automatically crawl all pages? Is there a way to confirm that bots aren't being denied access to subpages? Am i being parranoid?
Thanks in advance. Your forum is wonderful!
Posted 30 January 2007 - 02:55 PM
As the robots.txt is an exclusion protocol, the lack of a robots.txt file is taken as implied permission to crawl & index everything that they can find links to.
If you want to get rid of the errors, just upload a blank file which will have the same effect on compliant bots.
Posted 31 January 2007 - 12:27 AM
If there is no robots.txt file, the spiders will consider any page on the site to be OK to fetch.
I would recommend creating a robots.txt file, even if you don't mind where the spiders travel within your site. The reason why is that the 404 errors generated by requests for this page might cause you to miss a more important missing file.
Errors on the server _should_ be monitored and corrected. That way, when a new error crops (for example if an important page on the site is deleted/renamed by mistake, or if a URL link on a newly minted page contains a typo) you will notice and correct it more quickly.
You can learn more about the standard by reading the Standard for Robot Exclusion.
The short answer is to make explicit what is currently implied: namely that all robots are welcome anywhere on the site. You could accomplish that by adding the following to the robots.txt file in the domain root:
# all robots welcome
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users