Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Am I Right About This Robot.txt Instruction?


  • Please log in to reply
2 replies to this topic

#1 Say Yebo

Say Yebo

    HR 4

  • Active Members
  • PipPipPipPip
  • 220 posts
  • Location:USA

Posted 20 August 2009 - 08:45 PM

It looks to me like my client's website is blocking the search engines with it's robots.txt file.

It says this:

# No robot are allow to crawl this web app.
User-agent: *
Disallow: /

The URL also has a 302 redirect on it which redirects from a simple URL to a long complicated one - further discouraging the search engines.

The URL starts off as
www.clientsite.groupname.com
and redirects to
www.groupname.com/clientsite/showClinicInformation.do?discriminator=LOCATION_AND_HOURS&parentId=1722&itemId=2362&

The robots.txt file turns up on www.groupname.com/clientsite/robots.txt

To complicate matters, it's a site generated by a content management system designed to generate many such sites - all with the same issues.

My questions are - based on the info in the robots.txt file, what exactly is being blocked? The entire site including the home page, or everything after the home page after the redirect?

And does it matter that "No robot are allow to crawl this web app" is bad english?

Any other tips about fixing this mess would also be welcomed ;-)

Thanks everyone.


#2 qwerty

qwerty

    HR 10

  • Moderator
  • 8,589 posts
  • Location:Somerville, MA

Posted 20 August 2009 - 09:26 PM

The search engines look for the robots.txt file at the root of the domain. If the search engine considers the default address of the site to be www.groupname.com/ (with clientsite as a subdirectory of that domain), then it's going to look for the file at www.groupname.com/robots.txt. If it considers the default address to be www.clientsite.groupname.com, then it's going to look for it at the above-mentioned location and probably also at www.clientsite.groupname.com/robots.txt. Anyplace else, and they're just not going to find it.

But for what it's worth, the code you posted would (if a spider found it), say not to crawl any pages on the site.

What the 302 is doing, in addition to redirecting all requests, is giving the search engine the instruction to keep checking www.clientsite.groupname.com, because the redirect is only temporary.

#3 Say Yebo

Say Yebo

    HR 4

  • Active Members
  • PipPipPipPip
  • 220 posts
  • Location:USA

Posted 21 August 2009 - 09:45 AM

Thanks for that clarity Qwerty.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!