I noticed the other day that a website had a robots.txt in the root of the domain and as well as one in the root of the blog directory. I am wondering how many other people out there do this? Do you find the bots listening to both of them properly?
My feeling is that you only need the one in the website root and direct the bot to do what you want from there?
Why use two of them?
Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!
www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses

Double Robots.txt?
Started by
incrediblehelp
, Mar 31 2009 04:08 PM
5 replies to this topic
#1
Posted 31 March 2009 - 04:08 PM
#2
Posted 31 March 2009 - 04:36 PM
robots.txt anywhere but the Root level will be ignored by the spiders. In fact it would surprise me if it's ever even queried. robots.txt is not like .htaccess where you can control things on a per directory level.
The only way a subdirectory robots.txt might be valid is the rare case where someone has a domain name parked on a subdirectory of another domain. Or possibly if the subdirectory is really a subdomain, though that one too is questionable in my mind and isn't something I've tested to see if spiders look for a robots.txt for each subdomain.
Maybe Alan knows the answer to that one?
The only way a subdirectory robots.txt might be valid is the rare case where someone has a domain name parked on a subdirectory of another domain. Or possibly if the subdirectory is really a subdomain, though that one too is questionable in my mind and isn't something I've tested to see if spiders look for a robots.txt for each subdomain.
Maybe Alan knows the answer to that one?
#3
Posted 31 March 2009 - 05:07 PM
Actually that is what I figured Randy. Thanks for the feedback.
I have heard of different robots.txt for https and http sites before
I have heard of different robots.txt for https and http sites before
#4
Posted 31 March 2009 - 06:22 PM
open: ./htdocs/robots.txt:
User-agent: Googlebot
Disallow: /blog/
User-agent: Googlebot
Disallow: /blog/
#5
Posted 01 April 2009 - 12:07 AM
QUOTE
Or possibly if the subdirectory is really a subdomain, though that one too is questionable in my mind and isn't something I've tested to see if spiders look for a robots.txt for each subdomain.
They do, Randy. They do.
FWIW, I almost always back up a file before modifying it. My ex-wife always said I had trust issues? At any rate, I probably have a few copies of robots.txt laying around on more than a few sites. I don't worry about it because, as you pointed out, the only one that counts is in the root.
#6
Posted 01 April 2009 - 06:26 AM
They do, Randy. They do.
FWIW, I almost always back up a file before modifying it. My ex-wife always said I had trust issues? At any rate, I probably have a few copies of robots.txt laying around on more than a few sites. I don't worry about it because, as you pointed out, the only one that counts is in the root.
FWIW, I almost always back up a file before modifying it. My ex-wife always said I had trust issues? At any rate, I probably have a few copies of robots.txt laying around on more than a few sites. I don't worry about it because, as you pointed out, the only one that counts is in the root.
or even better, source control everything; and i mean EVERYTHING (ok, maybe not the wife)
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users