Are you a Google Analytics enthusiast?
More SEO Content
Outdated Google Cache
Posted 18 February 2006 - 04:47 PM
Hmm! I had always assumed google was more active, but I may have been wrong. I agree, the content has changed, but very little over these months, so that may have something to do with it. I am doing more changes to it now, and we shall see how things go.
Posted 19 February 2006 - 12:40 PM
Posted 19 February 2006 - 06:46 PM
Like Jill says, it's all messed up at the moment (ever known a time when it wasn't?) and we'll have to wait (as usual) for it to settle down (which is a state we never recognise until we've passed through it).
You have to wonder what Google is doing with data that old anywhere on its system (other, of course, than to comply with which ever government demands it! ). In internet terms 7/8 months is ancient history. Why not mix in some cached pages from 2003 or 1999? They'd be just as relevant.
It's always been one of dmoz's handicaps, that you have to remember when you are looking at dmoz it is like looking into space. You are looking at parts of the internet, not as they are now, but as they were 2-4 light years ago.
Well, dmoz has an excuse. It is human-edited. Google doesn't!! It is a machine, and seemingly not a very efficient one. When you think about it, it is ludicrous that Google's view of the internet should be anything more than a few days or weeks out of date. Any Google product based on data older than that is, well "not fit for purpose".
Posted 20 February 2006 - 07:39 AM
sorry to be updating my own last post on an old thread, but I've just found an even better example!
A page which Google keeps in its index of pages from my site, having cached it at lunchtime on 23 Jan 2005... OVER A YEAR AGO!!!!!!!
A page which has not existed in that form (or with that suffix) since June/July, but which no doubt is even now devaluing the current version of that page as partial duplicate content.
My question remains. What is google doing with data this old?
Would anyone care to suggest what a max limit/expiry date for cached material should/might be... in numerical terms (days/weeks/months)? I'm curious.
Posted 20 February 2006 - 08:21 AM
by moving to a new domain google loved the site and indexed it all properly up 2 date ect.
it was as if it had given the originally domain a black spot next to its name,
Posted 20 February 2006 - 09:00 AM
Google's cache archive goes back several years. The older page images are usually treated as supplemental results and they become most visible during major updates to the Google database (such as is happening right now).
Supposedly, if you recreate the old page and include a ROBOTS meta tag in it that says "noindex,nofollow,nocache" Google will (when or if it recrawls the page) remove the old image.
I have seen other people get Google to remove old images. Your mileage may vary.
Posted 20 February 2006 - 10:02 AM
Re my pages. The noindex, nocache recreated page idea is an interesting one. Presumably, even easier, I could just add it to robots.txt? All the same, you'd think 301 redirects and a comprehensive google sitemap trawled every day would allow Google to form a better idea of what is actually there and what isn't!
But on the bigger question..
I still don't understand why.
I mean they must have a statistical strategy...
EG. 'We can't cyclically trawl and index the entire internet in less than x weeks/months/years. Therefore we'll aim to re-index the 15% that seems to change often (blogs, newspapers, e-commerce) once every month, the next 15% within two months, and the remaining 70% of mostly static pages we'll re-update within two years. That's because we think it is better to have all the internet indexed at medium accuracy than to have a high quality up-to-date index of just part of the internet'
I mean what are the debates that go on in Googleplex about this? Are there any clues why they feel the need to have cached pages older than six months or a year?
Posted 20 February 2006 - 10:28 AM
I'm not really sure on how other sites get their old pages removed from what I call archival cache. I'm not fully convinced Google actually dumps the data.
But what you see in the normal search results is usually an indication that Google knows what you have now. I mean, the older pages that are no longer linked to aren't supposed to be served up in their results, but they hang around in the historical cache for reasons Google doesn't disclose. When they update, those older cache files get served more often than at other times.
That is one of the primary reasons for why I think Google takes shards offline one at a time. When a primary shard goes offline, a secondary shard -- which is more likely to have archived cache data -- takes over and then you start to see funky results.
I still don't understand why.
For all we know, the guys at Google don't understand why, although you'd never see them admit to that.
I've presented my hypothesis, although if you don't know what a shard is (and outside of Google, people only have a vague idea that it represents a portion of their database) I suppose my hypothesis won't make much sense.
Even if you do know what a shard is, my hypothesis only makes minimal sense (in my opinion). But that's what we get for basing my best guess on my ignorance.
Posted 20 February 2006 - 10:46 AM
Sounds about right.
So to paraphrase: when google needs to take a shard offline during an update they plug the hole with old junk that they keep out the back for just that purpose!
At least now I can see a good reason.
Posted 20 February 2006 - 12:00 PM
I don't know if that is significant. I cannot imagine why it should be. If it does play a role, then I would guess it's a minor bug.
I still favor the idea of a supplemental cache containing historical data.
Posted 18 March 2006 - 05:30 AM
with our sites we've got the same problem:
End 2004 we changed the internal linking structure -
and since March 2005 the majority of our sites is displayed on google result pages only as a link (no title, no snippet) - and often enough some pages are completely ignored by google.
I'm now pretty sure this is a double content problem - because since 2004 SE are ignoring the 301 Moved Permanently redirect. So google is caching the content twice - under the old AND under the new address - duplicate successfully generated :-(.
Due to the fact, google is marking the new addresses as a duplicate, they are penalized and the cache of them seems to be never updated.
Lots of our new-addressed pages have completely renewed and unique content - but google is still displaying caches of them dating back to end 2004/ beginning 2005.
Google is visiting our sites approx. twice a week, and nearly every day in the morning (Salzburg is GMT+1) google is displaying our sites properly (with actual cache!) but a few hours later (~ 10:00 - 11:00hrs) one third of the results is vanishing and the second third is displayed only with URL - and when asked for the cache google is displaying again the outdated 15months old content.
So it's definitely NOT a problem of lacking links to the sites or a technical problem - it must be a bug in the google algorithms or more likely simply a penalty for having published "duplicates" (accidentally by using a 301 redirect which is completely misinterpreted by google).
When asking google for a solution we got an answer (there are more things between heaven and earth... :-) but this answer was only commonplaces, like a doctor giving the cancer patient the advice not to smoke, avoid alcohol and doing lots of sport.
Perhaps anybody here is able to give us (and of course Jupiter) some useful advices?
Thank you very much in advance and kindest regards from Salzburg
Posted 18 March 2006 - 05:54 AM
Sounds pretty much like a canonicalisation issue. different results are returned for a site: search with and without the www on your url.
Proably caused by your redirect from the non www version which is going to www.sitename.tld/index.html rather than to www.sitename.tld
The issue with PIPs ( "Partially Indexed Pages"[/hr]) is often a complex one and may not be what it seems.
If this is on a site: search add a keyword after the url to force a text snippet eg:
Posted 18 May 2007 - 08:49 AM
I submitted a Google sitemap and in the google webtools it says it has indexed my site in the past day but the crawl page says it hasn't 'successfully accessed' my 'home page' (the Home Page Crawl) since before the launch of the full site.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users