Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Outdated Google Cache


  • Please log in to reply
42 replies to this topic

#31 google_bot

google_bot

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 18 February 2006 - 04:47 PM

Well I am new here, but also have a similar problem. I noticed one of my pages is still showing the google cache from almost 8 months ago! But Yahoo is right ontop of things and re-cashed it last about a week ago or less.

Hmm! I had always assumed google was more active, but I may have been wrong. I agree, the content has changed, but very little over these months, so that may have something to do with it. I am doing more changes to it now, and we shall see how things go.

#32 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,315 posts

Posted 19 February 2006 - 12:40 PM

Google's doing some database changes right now, I'm sure it will straighten itself out shortly.

#33 nedguy

nedguy

    HR 4

  • Active Members
  • PipPipPipPip
  • 240 posts
  • Location:London, UK

Posted 19 February 2006 - 06:46 PM

Yeah, I've got a bunch of pages I suddenly noticed yesterday indexed among the results of a 'site: search' that date back to early July. Those pages haven't existed since Aug and requests for them have been individually 301 redirected since early November.

Like Jill says, it's all messed up at the moment (ever known a time when it wasn't?) and we'll have to wait (as usual) for it to settle down (which is a state we never recognise until we've passed through it).wink.gif

BUT...

You have to wonder what Google is doing with data that old anywhere on its system (other, of course, than to comply with which ever government demands it! naughty.gif ). In internet terms 7/8 months is ancient history. Why not mix in some cached pages from 2003 or 1999? They'd be just as relevant.

It's always been one of dmoz's handicaps, that you have to remember when you are looking at dmoz it is like looking into space. You are looking at parts of the internet, not as they are now, but as they were 2-4 light years ago.

Well, dmoz has an excuse. It is human-edited. Google doesn't!! It is a machine, and seemingly not a very efficient one. When you think about it, it is ludicrous that Google's view of the internet should be anything more than a few days or weeks out of date. Any Google product based on data older than that is, well "not fit for purpose".

#34 nedguy

nedguy

    HR 4

  • Active Members
  • PipPipPipPip
  • 240 posts
  • Location:London, UK

Posted 20 February 2006 - 07:39 AM

lmao.gif oh oh oh...

sorry to be updating my own last post on an old thread, but I've just found an even better example!

http://72.14.207.104...&ct=clnk&cd=318

A page which Google keeps in its index of pages from my site, having cached it at lunchtime on 23 Jan 2005... OVER A YEAR AGO!!!!!!!

A page which has not existed in that form (or with that suffix) since June/July, but which no doubt is even now devaluing the current version of that page as partial duplicate content.

My question remains. What is google doing with data this old?

Would anyone care to suggest what a max limit/expiry date for cached material should/might be... in numerical terms (days/weeks/months)? I'm curious.

#35 shimlad

shimlad

    HR 4

  • Active Members
  • PipPipPipPip
  • 243 posts
  • Location:UK

Posted 20 February 2006 - 08:21 AM

sounds to me like google found your site, and then decided it didnt like it - i had a similar problem and fixed such problems such as the non www and www issues

by moving to a new domain google loved the site and indexed it all properly up 2 date ect.

it was as if it had given the originally domain a black spot next to its name,

#36 Michael Martinez

Michael Martinez

    HR 9

  • Active Members
  • PipPipPipPipPipPipPipPipPip
  • 4,805 posts
  • Location:Georgia

Posted 20 February 2006 - 09:00 AM

QUOTE(nedguy @ Feb 20 2006, 06:39 AM)
My question remains. What is google doing with data this old?


Google's cache archive goes back several years. The older page images are usually treated as supplemental results and they become most visible during major updates to the Google database (such as is happening right now).

Supposedly, if you recreate the old page and include a ROBOTS meta tag in it that says "noindex,nofollow,nocache" Google will (when or if it recrawls the page) remove the old image.

I have seen other people get Google to remove old images. Your mileage may vary.

#37 nedguy

nedguy

    HR 4

  • Active Members
  • PipPipPipPip
  • 240 posts
  • Location:London, UK

Posted 20 February 2006 - 10:02 AM

Thanks Michael. I was hoping you'd have some input.

Re my pages. The noindex, nocache recreated page idea is an interesting one. Presumably, even easier, I could just add it to robots.txt? All the same, you'd think 301 redirects and a comprehensive google sitemap trawled every day would allow Google to form a better idea of what is actually there and what isn't!

But on the bigger question..

I still don't understand why.

I mean they must have a statistical strategy...

EG. 'We can't cyclically trawl and index the entire internet in less than x weeks/months/years. Therefore we'll aim to re-index the 15% that seems to change often (blogs, newspapers, e-commerce) once every month, the next 15% within two months, and the remaining 70% of mostly static pages we'll re-update within two years. That's because we think it is better to have all the internet indexed at medium accuracy than to have a high quality up-to-date index of just part of the internet'

I mean what are the debates that go on in Googleplex about this? Are there any clues why they feel the need to have cached pages older than six months or a year?

#38 Michael Martinez

Michael Martinez

    HR 9

  • Active Members
  • PipPipPipPipPipPipPipPipPip
  • 4,805 posts
  • Location:Georgia

Posted 20 February 2006 - 10:28 AM

QUOTE(nedguy @ Feb 20 2006, 09:02 AM)
Re my pages. The noindex, nocache recreated page idea is an interesting one. Presumably, even easier, I could just add it to robots.txt? All the same, you'd think 301 redirects and a comprehensive google sitemap trawled every day would allow Google to form a better idea of what is actually there and what isn't!


I'm not really sure on how other sites get their old pages removed from what I call archival cache. I'm not fully convinced Google actually dumps the data.

But what you see in the normal search results is usually an indication that Google knows what you have now. I mean, the older pages that are no longer linked to aren't supposed to be served up in their results, but they hang around in the historical cache for reasons Google doesn't disclose. When they update, those older cache files get served more often than at other times.

That is one of the primary reasons for why I think Google takes shards offline one at a time. When a primary shard goes offline, a secondary shard -- which is more likely to have archived cache data -- takes over and then you start to see funky results.

QUOTE
But on the bigger question..

I still don't understand why.


For all we know, the guys at Google don't understand why, although you'd never see them admit to that.

I've presented my hypothesis, although if you don't know what a shard is (and outside of Google, people only have a vague idea that it represents a portion of their database) I suppose my hypothesis won't make much sense.

Even if you do know what a shard is, my hypothesis only makes minimal sense (in my opinion). But that's what we get for basing my best guess on my ignorance.

#39 nedguy

nedguy

    HR 4

  • Active Members
  • PipPipPipPip
  • 240 posts
  • Location:London, UK

Posted 20 February 2006 - 10:46 AM

Yes, I remember your shard theory very well from last October.

Sounds about right.

So to paraphrase: when google needs to take a shard offline during an update they plug the hole with old junk that they keep out the back for just that purpose! lol.gif

At least now I can see a good reason.

thanks

NG

#40 Michael Martinez

Michael Martinez

    HR 9

  • Active Members
  • PipPipPipPipPipPipPipPipPip
  • 4,805 posts
  • Location:Georgia

Posted 20 February 2006 - 12:00 PM

While checking to see if Google has crawled a new page I uploaded last week, I noticed a few minutes ago that they are showing a page I had up for only a short time last year. The one common feature between my page and the URL you showed earlier in the thread is that they both link to themselves in the Google cached image.

I don't know if that is significant. I cannot imagine why it should be. If it does play a role, then I would guess it's a minor bug.

I still favor the idea of a supplemental cache containing historical data.

#41 leadings

leadings

    HR 1

  • Members
  • Pip
  • 1 posts
  • Location:Salzburg / Austria

Posted 18 March 2006 - 05:30 AM

Hello,

with our sites we've got the same problem:
End 2004 we changed the internal linking structure -
and since March 2005 the majority of our sites is displayed on google result pages only as a link (no title, no snippet) - and often enough some pages are completely ignored by google.
I'm now pretty sure this is a double content problem - because since 2004 SE are ignoring the 301 Moved Permanently redirect. So google is caching the content twice - under the old AND under the new address - duplicate successfully generated :-(.
Due to the fact, google is marking the new addresses as a duplicate, they are penalized and the cache of them seems to be never updated.
Lots of our new-addressed pages have completely renewed and unique content - but google is still displaying caches of them dating back to end 2004/ beginning 2005.
Google is visiting our sites approx. twice a week, and nearly every day in the morning (Salzburg is GMT+1) google is displaying our sites properly (with actual cache!) but a few hours later (~ 10:00 - 11:00hrs) one third of the results is vanishing and the second third is displayed only with URL - and when asked for the cache google is displaying again the outdated 15months old content.
So it's definitely NOT a problem of lacking links to the sites or a technical problem - it must be a bug in the google algorithms or more likely simply a penalty for having published "duplicates" (accidentally by using a 301 redirect which is completely misinterpreted by google).
When asking google for a solution we got an answer (there are more things between heaven and earth... :-) but this answer was only commonplaces, like a doctor giving the cancer patient the advice not to smoke, avoid alcohol and doing lots of sport.

Perhaps anybody here is able to give us (and of course Jupiter) some useful advices?

Thank you very much in advance and kindest regards from Salzburg
Frank

#42 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 5,881 posts
  • Location:Blackpool UK

Posted 18 March 2006 - 05:54 AM

welcome to HR Frank hi.gif

Sounds pretty much like a canonicalisation issue. different results are returned for a site: search with and without the www on your url.
Proably caused by your redirect from the non www version which is going to www.sitename.tld/index.html rather than to www.sitename.tld

The issue with PIPs ( "Partially Indexed Pages"[/hr]) is often a complex one and may not be what it seems.
If this is on a site: search add a keyword after the url to force a text snippet eg:
site:www.theluxuryhotels.info hotel

#43 jedweb

jedweb

    HR 1

  • Members
  • Pip
  • 1 posts

Posted 18 May 2007 - 08:49 AM

This is weird. I have the same problem with only the old cached version of my site showing up. Unfortunately the cached version is only a single page that announces the upcoming launch of the site. It's not just Google actually, but Yahoo and Alltheweb have the same thing; only my single splash page. Their search result obviously leads to my full site though.

I submitted a Google sitemap and in the google webtools it says it has indexed my site in the past day but the crawl page says it hasn't 'successfully accessed' my 'home page' (the Home Page Crawl) since before the launch of the full site.

Frustrating.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users