Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses
Rss Turned Off For Google Bot
Started by
cheethebee
, Apr 22 2008 08:09 AM
7 replies to this topic
#1
Posted 22 April 2008 - 08:09 AM
Hi all,
I have one of my sites I am working on where the RSS files are set to disallow for Googlebot. I have no idea why the previous person did this, and cant find anything online about this. Does anyone have any ideas? Should it not be there?
User-agent: Googlebot
Disallow: /*.xml
Thanks,
Chee.
I have one of my sites I am working on where the RSS files are set to disallow for Googlebot. I have no idea why the previous person did this, and cant find anything online about this. Does anyone have any ideas? Should it not be there?
User-agent: Googlebot
Disallow: /*.xml
Thanks,
Chee.
#2
Posted 22 April 2008 - 08:49 AM
.xml and .rss files aren't always the best thing for Google to index as they come out all text and sometimes jumbled. It's better for Google to index the actual articles (usually) rather than the rss file, which is probably why it was disallowed.
#3
Posted 22 April 2008 - 09:14 AM
does that also mean your sitemap doesn't work?
if it resides on the route and is called sitemap.XML
if it resides on the route and is called sitemap.XML
#4
Posted 23 April 2008 - 05:29 PM
Nope, different thing entirely
The SE indexing systems know exactly how to read the sitemap XML document tree.
The SE indexing systems know exactly how to read the sitemap XML document tree.
#5
Posted 24 April 2008 - 04:05 AM
so
QUOTE
User-agent: Googlebot
Disallow: /*.xml
doesn't stop it from accessing the sitemap, it just stops it from indexing it, is that right?
Disallow: /*.xml
#6
Posted 24 April 2008 - 07:09 AM
hmm... that's a good question 1dmf. I hadn't read your question that way the first time.
It would be a good question for the folks at Google etal. If you're declaring an xml sitemap in your robots.txt, effectively giving the the pointer to where it's located, but farther down in your robots.txt you're excluding all .xml files, which direction will the spiders follow? Or to put it another way would the sitemap pointer be treated as if it were an Allow for a specific xml file? Or is the sitemap bot completely different from the normal Googlebot, thus doesn't read robots.txt.
I'm honestly not sure what approach they would take. And I could make a logical argument for either outcome.
As a temporary measure I'd say the best solution in this situation would be to provide a specific Allow statement just below the /*.xml Disallow to put your xml sitemap back in play.
It would be a good question for the folks at Google etal. If you're declaring an xml sitemap in your robots.txt, effectively giving the the pointer to where it's located, but farther down in your robots.txt you're excluding all .xml files, which direction will the spiders follow? Or to put it another way would the sitemap pointer be treated as if it were an Allow for a specific xml file? Or is the sitemap bot completely different from the normal Googlebot, thus doesn't read robots.txt.
I'm honestly not sure what approach they would take. And I could make a logical argument for either outcome.
As a temporary measure I'd say the best solution in this situation would be to provide a specific Allow statement just below the /*.xml Disallow to put your xml sitemap back in play.
#7
Posted 24 April 2008 - 09:53 AM
As a temporary measure I'd say the best solution in this situation would be to provide a specific Allow statement just below the /*.xml Disallow to put your xml sitemap back in play.
You can also add the allow directive for the sitemap file before and after the disallow all .xml's directive.
allow sitemap.xml
disallow *.xml
allow sitemap.xml
Since we don't know if Google stops at the first match, or uses the last match, now isn't there an RFC on robots.txt ? Surely Google would follow standard protocol?
#8
Posted 24 April 2008 - 10:51 AM
Well, yes there are robots.txt standards.
But Allow: isn't part of the standards if memory serves.
But Allow: isn't part of the standards if memory serves.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users









