Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Gwmts On Steroids


  • Please log in to reply
5 replies to this topic

#1 ttw

ttw

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 378 posts
  • Location:San Mateo, California

Posted 20 May 2014 - 02:41 PM

On a client site on Apr 19/14, GWMT started producing thousands of 404 errors and we're now to 16,000 404 errors being reported right now.  

 

Many of these errors are legitimate pages/files from a very old site dating back 5-8 years ago. Obviously the pages and files were not 301'd at the time.   We can see that at least for some of the links we tested, these links still appear on other sites and that's clearly how Google found them. (but why only report them now?)

 

It's not possible to redirect such a large number of old links.  Given that Google says "Generally, 404s don't harm your site's performance in search...."  

 

Now there is the "Priority" column that GWMT's blog says: 

 

One thing we’re really excited about in this new version of the Crawl errors feature is that you can really focus on fixing what’s most important first. We’ve ranked the errors so that those at the top of the priority list will be ones where there’s something you can do, whether that’s fixing broken links on your own site, fixing bugs in your server software, updating your Sitemaps to prune dead URLs, or adding a 301 redirect to get users to the “real” page. We determine this based on a multitude of factors, including whether or not you included the URL in a Sitemap, how many places it’s linked from (and if any of those are also on your site), and whether the URL has gotten any traffic recently from search.

 
We could easily fix the priority items marked as 1-25 but that's pulling a base priority number out of a hat because we have no idea if going beyond Priority 1,2,3 is important.
 
To me the best thing we can do is to make sure that the current xml sitemaps do not contain any of these 404 URLs and fix any obvious URLs. 
 
We understand that any link juice present with the URLs  will be lost from the pages containing those links but I don't think that warrants trying to track down such a high number of 404 errors that only started to show up last month.
 
There was no sudden drop in traffic when the 404's started to pile up in April.
 
Anything else you think should be done?  

 

Thank you

 

Rosemary


Edited by ttw, 20 May 2014 - 03:51 PM.


#2 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 7,018 posts
  • Location:Blackpool UK

Posted 20 May 2014 - 05:57 PM

Nothing really, Google have probably pulled a database server back into service so are refreshing "stale" data and checking that the URLs still return a 404 response. I merged three of my sites well over a year ago, redirected what I wanted to keep live and let the rest return a 404 response, those URLs  are never going to see the light of day again and eventually Google, Yahoo!/Bing et al will catch up with the fact that they are definitely deceased, but until then I have MANY HUNDREDS of "Missing URLs" being reported by webmaster tools, it seems to be one of the few things that it gets right.



#3 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,145 posts
  • Location:Georgia

Posted 21 May 2014 - 10:52 AM

Google allocates only so much crawl for each site. Normally 404 errors don't detract much from the so-called "crawl budget" (which you will never see). But if you're seeing THOUSANDS EACH MONTH that is a problem you do want to address. Depending on how the old URLs were structured you may indeed be able to redirect or block large numbers of them with relatively few commands, options.

The amount of time a search engine spends crawling your site is important for several reasons:
  • It directly impacts how fast their index reflects changes on your site
  • It adds to the server's performance load
  • It uses up some of your very limited user connections
  • For every broken URL that is fetched a live one is NOT fetched
If the old URLs are in folders that the client isn't using, you should try blocking those old folders in "robots.txt". That at least reduces the unnecessary crawling.

If the Old Links Create Doubt or Concern
If the client site is hosted on UNIX/Linux using Apache Webserver then they may have the option of using the RedirectMatch directive in their .htaccess file.

Assuming the old URLs are located in folders the client site is not using any more, the old URLs can be redirected to a new page that explains that the site has changed, or to some offsite destination (such as example.com).

Redirects themselves don't reduce crawl (in fact, they add to it) but over time the search engines will try to normalize the URLs by skipping the redirects. I asked Matt Cutts if redirecting old URLs that had spammy links pointing to them (links which could not be removed) to example.com would be okay and he said that in that extreme case it would be.

Alternatively, you can implement a "403 Forbidden" directive for all those pages (or a "410 gone").

Edited by Michael Martinez, 21 May 2014 - 10:53 AM.


#4 ttw

ttw

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 378 posts
  • Location:San Mateo, California

Posted 30 May 2014 - 05:46 PM

In drilling down further into the list of websites pointing to URLs returning a 404 we can see that for some of the URLs are very spammy sites that the client says they never developed (they never had anyone build links for them - and I believe them).

When we evaluate the spam sites linking to very old client pages, we see what appears to be scraper sites taking old client URLs + a small bit of text from that page and adding content completely unrelated to my client's line of business.  The content is clearly to bolder Example: (look under relevant web pages)

Our concern is that these spammy links pointing to the client's site may create a Google penalty even though there isn't one now.  Since this spike in 404s only started one month ago, I'm trying to get in front of any penalty issue for the client. 



#5 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 7,018 posts
  • Location:Blackpool UK

Posted 01 June 2014 - 02:08 PM

How are 'old'  links that return a 404 response going to be a "penalty issue"?



#6 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,145 posts
  • Location:Georgia

Posted 02 June 2014 - 08:56 AM

Matt Cutts once told me that spammy links pointing to non-existing URLs won't be counted against a Website.  Although I can't promise that Google won't change its mind about that, the reason seems clear to me: a non-existing URL (returning a real 404 status code) cannot pass PageRank to the rest of the site.


  • Jill likes this




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!