Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 




From the folks who brought you High Rankings!

- - - - -

Googlebot Keeps Asking For 404 Pages

  • Please log in to reply
4 replies to this topic

#1 jeppeb1


    HR 1

  • Members
  • Pip
  • 2 posts

Posted 30 June 2012 - 07:15 AM

More than three years ago, my PR7 site was hacked and filled with spammy pages. Everything was cleaned, and there hasn't been any hack since. The "product pages" generated by the hack (maybe 10.000-100.000 pages) have responded with a 404 header for more than three years, and they no longer appear in the SERPS (also after some manual use of the removal tool). But thousands of inbound links still exist on boards and guestbooks all over the web, and therefore Googlebot keeps asking for these pages. From the weblog I can see that the majority of all Googlebot visits on my site are asking for these pages. This is ridiculous now three years after the hack. And it definitely can't be good.

I know that I can just prevent these crawling attempts via robots.txt. But if I do that, the urls of the non-existing pages will appear in the SERPS (without snippet), when doing a site: mydomain.com search.

What is the best practice to get Google to understand, that these pages don't exist? 404 responses, robots.txt, or something else? Please share your experiences on this matter...

#2 torka


    Vintage Babe

  • Moderator
  • 4,825 posts
  • Location:Triangle area, NC, USA, Earth (usually)

Posted 30 June 2012 - 10:44 AM

Honestly, I don't know for sure... but can you generate a 410 "Gone" response instead of the 404? A 410 response is supposed to indicate a resource that was deliberately removed and will not be coming back. Perhaps that will let Google know they're gone for good?

--Torka :propeller:

#3 jeppeb1


    HR 1

  • Members
  • Pip
  • 2 posts

Posted 01 July 2012 - 04:12 PM

Thanks, torka. :)

I'll try the 410. I'm worried about all the links from the spam pages, now where Google is penalizing sites that have these bad links (the Penguin update), and I'm not sure if 410 will change that... I have definitely seen a drop in rankings after Penguin.

#4 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,325 posts
  • Location:Georgia

Posted 05 July 2012 - 01:53 PM

The 410 won't help. Google disclosed earlier this year that they are treating 410 and 404 about the same.

Probably you should just redirect all the bad URLs to a single page that uses a "noindex,follow" robots meta tag. The page should have a link pointing to your HTML sitemap (or your root URL).

I would NOT simply redirect the bad URLs to an important page.

#5 chrishirst


    A not so moderate moderator.

  • Moderator
  • 7,718 posts
  • Location:Blackpool UK

Posted 06 July 2012 - 04:18 AM

The 410 won't help. Google disclosed earlier this year that they are treating 410 and 404 about the same.

What John Mueller (John Mu) posted in May 2012 was this

Which hasn't really changed from this: (2009 quote from Google Groups) where a 410 response was being handled more or less correctly (according to the the RFC 2616 and the W3c Status Code definitions

However I think he is a bit 'off' in the time scale difference, we have probably all seen that a URL returning a 404 response often hangs around for months. Though the crawl frequency does start to tail off after a few weeks.

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

We are now a read-only forum.
No new posts or registrations allowed.