Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Google Crawling Locations Disallowed In Robots.txt?


  • Please log in to reply
4 replies to this topic

#1 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 21 February 2010 - 07:49 AM

Okay here is the deal, this is what is disallowed in Robots.txt

User-agent: *

Disallow: /wp-*
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins/


And low behold what do I see reported by Crawl Rate Tracker within WordPress:



/wp-admin/ 2 crawls Links
/wp-login.php?redirect_to=http%3A%2F%2Fwww.mydomain... 1 crawls Links
/wp-login.php 1 crawls Links
/wp-admin/index-extra.php?jax=dashboard_incoming_l... 1 crawls Links
/wp-admin/index-extra.php?jax=dashboard_primary 1 crawls Links
/wp-admin/index-extra.php?jax=dashboard_secondary 1 crawls Links
/wp-admin/index-extra.php?jax=dashboard_plugins 1 crawls Links
/wp-admin/options-general.php?page=robots-meta 1 crawls Links
/wp-login.php?action=logout&_wpnonce=932793d5bb 1 crawls Links
/wp-login.php?loggedout=true 1 crawls Links

What the hell Google! ranting.gif

Is there something wrong with my Robots.txt?

#2 qwerty

qwerty

    HR 10

  • Moderator
  • 8,608 posts
  • Location:Somerville, MA

Posted 21 February 2010 - 10:29 AM

Is that tool checking for crawls by googlebot, or just any spider? There are plenty of bots out there that don't bother with the REP.

#3 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 21 February 2010 - 06:29 PM

Googlebot specifically, I've blocked those locations also now with Robots Meta, the plugin by Joost de Valk (Yoast.com).
Which I usually do, but forgot to do this time, despite me forgetting...the info in the robots.txt in my root directory should prevent those locations to be crawled.

The same Robots.txt is also showing up in Google Webmaster Central, so it's really odd.

#4 qwerty

qwerty

    HR 10

  • Moderator
  • 8,608 posts
  • Location:Somerville, MA

Posted 21 February 2010 - 06:46 PM

QUOTE
the info in the robots.txt in my root directory should prevent those locations to be crawled.

Yeah, it definitely should. I don't think I've ever seen Googlebot request a page that was disallowed in robots.txt.

What happens if you test one of the disallowed URLs in the Crawler Access section of WMT? Does it report back that it's blocked?

#5 SERPico

SERPico

    HR 4

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 22 February 2010 - 04:47 AM

Yep, it reports it's blocked.

I guess Googlebot just doesn't always adhere to the Robots.txt




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!