Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Cache:www.mysite.com - Inconsistent With Indexed Content


  • Please log in to reply
7 replies to this topic

#1 bobmeetin

bobmeetin

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 549 posts
  • Location:Colorado

Posted 04 June 2009 - 11:09 AM

I'm once again baffled. I "thought" that going to google and inputting "cache:www.mysite.com" would reliably tell me the last time the website was indexed.

Last night I added an article link on the home page of a website. I did the cache: think last night and this morning as:

CODE
cache:mysite.com
cache:www.mysite.com
cache:www.mysite.com/index.php
cache:http://www.mysite.com


All report: "It is a snapshot of the page as it appeared on 31 May 2009" - 4 days ago, yet if I go to google and enter the article title I find it in the sixth slot of Google Page 1.

Aside from checking the Apache log files, is there another mechanism for verifying the last index?

#2 adibranch

adibranch

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 332 posts

Posted 04 June 2009 - 11:46 AM

if the article page is returned in the index, whats the date of the cache of that page? Its unclear which page you're referring to with the cache date. To get the cache date of any single page, just click the 'cache' button underneath it in the index.

Anyway, if you submitted to some article url sites such as digg etc, google can index that page within hours, sometimes even minutes. I havent seen it very often, but i have seen it.

#3 bobmeetin

bobmeetin

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 549 posts
  • Location:Colorado

Posted 04 June 2009 - 11:59 AM

I added the article to the website yesterday, June 3, in the morning, I think, 'might' have been very late on June 2 but no earlier. I did not submit it anywhere else, no complications.

The google cache says the website and home page where the article title lives has a cache date of May 31. When I google the article title google is displaying the home page for the result, not the article detail page.

What google is saying for cache date is inconsistent with what is found in google search results, 3 or so days off. The home page had to have been indexed sometime yesterday or possibly this morning in order for results to display.

Cache: says I ain't been there, but results say otherwise. Perry Mason is no dummy - he would make sure that the judge would rule in favor of the results.

Me? I just want to know if "Cache:" can be trusted.

#4 adibranch

adibranch

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 332 posts

Posted 04 June 2009 - 12:24 PM

very odd.. normally cache date is pretty accurate but it could be a gliche. Look in the cache snapshot of the stored page and see what it looks like (using the cache button , not the cache: command). If the link is in there, the date is wrong from the cache command.

#5 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 04 June 2009 - 12:33 PM

Remember that you could be dealing with completely different datacenters. Not only for the normal index (you could be connecting to any of several) but also for the cache.

Each of these DC's is constantly updating, so depending upon which you happen to be connecting to at any given moment you may get slightly different information. Which can explain these types of smaller discrepancies.

#6 bobmeetin

bobmeetin

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 549 posts
  • Location:Colorado

Posted 04 June 2009 - 12:46 PM

The cache message says:

This is Google's cache of http://www.mysite.com/. It is a snapshot of the page as it appeared on 31 May 2009 03:45:47 GMT. The current page could have changed in the meantime. Learn more

This is for whatever cache: version I use. This version does not display the title nor link to the article.

Randy, are you suggesting that the different datacenters own a different piece of the puzzle? Thus perhaps when I google the title from my l'il ole PC I am googling and obtaining results from Datacenter A whereas when I do the cache: from the same l'il ole PC google is grabbing that result from a different DC, Datacenter Z? And of course the 2 datacenters may not have tied the knot?



#7 adibranch

adibranch

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 332 posts

Posted 04 June 2009 - 01:02 PM

although you could hit different datacenters, i've never seen a cache date differ from an indexed page ever... all a bit odd.

#8 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 04 June 2009 - 01:06 PM

That's exactly what I'm suggesting.

The data and the datacenters that contain the data are in a constant state of flux, constantly being updated here but not there, then being updated there but not here. And from the outside you have no real way to control which dataset you see for any specific query. Eventually, if the docs stays static for a long enough period of time, the important stuff gets all sync'd up. But if a document is new or changes with regularity you'll see all sorts of things that look a bit wonky on the surface.

re: The Cache date specifically, I've often wondered if that's the date the document was grabbed from the server or the date it was processed by that particular DC. I can make an argument for either way being the right approach. But have yet to find the right person --presumably someone on the crawl team-- who can or will answer the question with any level of certainty. In the grand scheme of things I suppose it's not all that important one way or another. But being the curious sort that's never stopped me from asking. giggle.gif




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

We are now a read-only forum.
 
No new posts or registrations allowed.