Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo

Rss Turned Off For Google Bot


  • Please log in to reply
7 replies to this topic

#1 cheethebee

cheethebee

    HR 3

  • Active Members
  • PipPipPip
  • 75 posts

Posted 22 April 2008 - 08:09 AM

Hi all,

I have one of my sites I am working on where the RSS files are set to disallow for Googlebot. I have no idea why the previous person did this, and cant find anything online about this. Does anyone have any ideas? Should it not be there?

User-agent: Googlebot
Disallow: /*.xml

Thanks,

Chee.



#2 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,325 posts

Posted 22 April 2008 - 08:49 AM

.xml and .rss files aren't always the best thing for Google to index as they come out all text and sometimes jumbled. It's better for Google to index the actual articles (usually) rather than the rss file, which is probably why it was disallowed.

#3 1dmf

1dmf

    Keep Asking, Keep Questioning, Keep Learning

  • Active Members
  • PipPipPipPipPipPipPip
  • 2,154 posts
  • Location:Worthing - England

Posted 22 April 2008 - 09:14 AM

does that also mean your sitemap doesn't work?

if it resides on the route and is called sitemap.XML

#4 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 5,886 posts
  • Location:Blackpool UK

Posted 23 April 2008 - 05:29 PM

Nope, different thing entirely

The SE indexing systems know exactly how to read the sitemap XML document tree.

#5 1dmf

1dmf

    Keep Asking, Keep Questioning, Keep Learning

  • Active Members
  • PipPipPipPipPipPipPip
  • 2,154 posts
  • Location:Worthing - England

Posted 24 April 2008 - 04:05 AM

so
QUOTE
User-agent: Googlebot
Disallow: /*.xml
doesn't stop it from accessing the sitemap, it just stops it from indexing it, is that right?

#6 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 24 April 2008 - 07:09 AM

hmm... that's a good question 1dmf. I hadn't read your question that way the first time.

It would be a good question for the folks at Google etal. If you're declaring an xml sitemap in your robots.txt, effectively giving the the pointer to where it's located, but farther down in your robots.txt you're excluding all .xml files, which direction will the spiders follow? Or to put it another way would the sitemap pointer be treated as if it were an Allow for a specific xml file? Or is the sitemap bot completely different from the normal Googlebot, thus doesn't read robots.txt.

I'm honestly not sure what approach they would take. And I could make a logical argument for either outcome.

As a temporary measure I'd say the best solution in this situation would be to provide a specific Allow statement just below the /*.xml Disallow to put your xml sitemap back in play.

#7 don h

don h

    HR 4

  • Active Members
  • PipPipPipPip
  • 188 posts

Posted 24 April 2008 - 09:53 AM

QUOTE(Randy @ Apr 24 2008, 08:09 AM) View Post
As a temporary measure I'd say the best solution in this situation would be to provide a specific Allow statement just below the /*.xml Disallow to put your xml sitemap back in play.



You can also add the allow directive for the sitemap file before and after the disallow all .xml's directive.

allow sitemap.xml
disallow *.xml
allow sitemap.xml

Since we don't know if Google stops at the first match, or uses the last match, now isn't there an RFC on robots.txt ? Surely Google would follow standard protocol?

#8 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 24 April 2008 - 10:51 AM

Well, yes there are robots.txt standards.

But Allow: isn't part of the standards if memory serves. lol.gif




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users