Are you a Google Analytics enthusiast?
More SEO Content
Rss Turned Off For Google Bot
Posted 22 April 2008 - 08:09 AM
I have one of my sites I am working on where the RSS files are set to disallow for Googlebot. I have no idea why the previous person did this, and cant find anything online about this. Does anyone have any ideas? Should it not be there?
Posted 22 April 2008 - 08:49 AM
Posted 22 April 2008 - 09:14 AM
if it resides on the route and is called sitemap.XML
Posted 23 April 2008 - 05:29 PM
The SE indexing systems know exactly how to read the sitemap XML document tree.
Posted 24 April 2008 - 04:05 AM
Posted 24 April 2008 - 07:09 AM
It would be a good question for the folks at Google etal. If you're declaring an xml sitemap in your robots.txt, effectively giving the the pointer to where it's located, but farther down in your robots.txt you're excluding all .xml files, which direction will the spiders follow? Or to put it another way would the sitemap pointer be treated as if it were an Allow for a specific xml file? Or is the sitemap bot completely different from the normal Googlebot, thus doesn't read robots.txt.
I'm honestly not sure what approach they would take. And I could make a logical argument for either outcome.
As a temporary measure I'd say the best solution in this situation would be to provide a specific Allow statement just below the /*.xml Disallow to put your xml sitemap back in play.
Posted 24 April 2008 - 09:53 AM
You can also add the allow directive for the sitemap file before and after the disallow all .xml's directive.
Since we don't know if Google stops at the first match, or uses the last match, now isn't there an RFC on robots.txt ? Surely Google would follow standard protocol?
Posted 24 April 2008 - 10:51 AM
But Allow: isn't part of the standards if memory serves.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users