Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



SEO Class in Chicago, IL

Learn How To Optimize Your Website on July 26, 2013


Looking for personalized in-depth SEO training among your peers?



High Rankings is offering a 1-day customized SEO training class in Chicago. Class size is limited so please sign-up now if you want in!



 


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

How Did Google Find This Url?!


  • Please log in to reply
15 replies to this topic

#1 Merkere

Merkere

    HR 1

  • Members
  • Pip
  • 3 posts

Posted 22 September 2003 - 12:16 PM

Hello all. I am totally stumped by something and am hoping someone can help me understand this. We have a url that is currently only partially indexed by Google (www.ni.com/matrixx). However, instead of indexing and crawling this URL, Google has chosen to index and crawl www.natinst.com/matrixx - a URL we do not activly use. (Since our site, years ago, was www.natinst.com, we still maintain those URLs, but they all redirect to ni.com URLs.)

Here's the research I've done:
1) I know ni.com/matrixx is only partially indexed because I've done the allinurl:ni.com/matrixx search on Google and see only the URL and not the description.
2) According to a Google link search (link:ni.com/matrixx) 11 pages link to the ni.com URL.
3) The same Google link search for natinst.com/matrixx shows the same 11 pages, though all of them truly link to ni.com/matrixx. No page links to the natinst.com URL.

Now, my understanding is that Google finds URLs by crawling pages and following those links. How on earth could Google then have found the natinst.com URL if no one links to it, and not the ni.com URL, which is linked to by several pages?

Any theories or insight would be greatly appreciated.

Many thanks!
Merkere

#2 OldWelshGuy

OldWelshGuy

    Work is Fun

  • Moderator
  • 4,713 posts
  • Location:Neath, South Wales, UK

Posted 22 September 2003 - 12:24 PM

Hi there,

I have been a bit overwhelmed by the power of google to follow non existent links.

I have decided that google can spider your actual web files, as there is no way on Gods earth could it have found some of the pages it has spidered on my server.

example: when i am building a site i will allocate a folder on my server for the client to 'view' the ongoing work, (i find this stops them telephoning me to find out how it is going) Now regularly google has pulled these pages into its index, there is NO LINK to these pages from anywhere as only myself and the customer knows the link, the only link would have been in the email i send the customer, so unless Google bot can somehow get to that, i am assuming that it can traverse my server files (a bit worrying lol).

I have even had test files indexed :-( where not even an email link has been sent out.

#3 SearchRank

SearchRank

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 2,333 posts
  • Location:Phoenix, AZ

Posted 22 September 2003 - 12:38 PM

Hi Merkere. Welcome to the forum. :thumbup:

Let me ask you this - is ni.com a new site?

Also you say natinst.com has been around for awhile so could it be that it has already been in Google's index for awhile and that is why it is still showing?

Furthermore, I noticed that natinst.com redirects to ni.com. What kind of a redirect do you have set up as they both point to different IP addresses.

ni.com - 130.164.140.26
natinst.com - 130.164.140.14

Google would see these as different sites as they have different IP addresses but yet with mirrored content. Maybe if natinst.com is pointed to 130.164.140.26 it would eventually be dropped from Google's index?

Anyone else have an opinion?

#4 Scottie

Scottie

    Psycho Mom

  • Admin
  • 6,294 posts
  • Location:Columbia, SC

Posted 22 September 2003 - 02:19 PM

OldWelshGuy- are you sure your stats are password protected? Sounds like Google may be finding your web stats and crawling them...

#5 Merkere

Merkere

    HR 1

  • Members
  • Pip
  • 3 posts

Posted 22 September 2003 - 02:21 PM

David Wallace asked:

Let me ask you this - is ni.com a new site?


No it has been around for several years, but we had natinst.com before then.

Also you say natinst.com has been around for awhile so could it be that it has already been in Google's index for awhile and that is why it is still showing?


natinst.com was the original site, but we changed to ni.com when that URL became available.

Furthermore, I noticed that natinst.com redirects to ni.com. What kind of a redirect do you have set up as they both point to different IP addresses.


I thought it was a permanent (301) redirect, but I checked to be sure, and, well, it's a 302! That could be part of the problem. However, ni.com/matrixx and natinst.com/matrixx are very, very new URLs. We just picked up the MATRIXx product a few months ago, so that URL did not exist when Google would have indexed the natinst.com site.

#6 Tenyque

Tenyque

    HR 2

  • Active Members
  • PipPip
  • 42 posts
  • Location:Suttons Bay, MI

Posted 22 September 2003 - 02:30 PM

Hello Merkere,

Here's my pet theory-

Since you mentioned talking an old domain I checked out how Google sees your homepage. When doing a search for:

ni.com
www.ni.com
www.natinst.com

I get the same result, all pointing appropriately at www.ni.com. Whereas when I search on this:

natinst.com

I get something different. A (very) small fragment/mirror site that Google currently thinks exist, when it really doesn't. You can verify that Google thinks it has at least two copies of your site by clicking on the cache of these four searches and looking up in the URL. The first three show a DocId of "bOjJYLMF7rQJ", with the last one having "fuW-jigHpFIJ" - not to mention pointing at nainst.com.

This is where the theory gets full of holes. I'm guessing that a spider came by, looking to update the (fragment) nainst.com site and picked up the link from one of the pages on the www.ni.com site, which Googlebot thinks is really on the nainst.com site. This in spite of the 302 redirect.

I would think that converting your redirection process over to 301's as opposed to 302's would eradicate the imaginary fragment site Google thinks exist and probably get rid of this behavior. Once in a great while Google believes links more than server codes and internal absolute links, especially if there's a DMOZ entry involved, though can be fixed with 301's and 404's.

#7 OldWelshGuy

OldWelshGuy

    Work is Fun

  • Moderator
  • 4,713 posts
  • Location:Neath, South Wales, UK

Posted 22 September 2003 - 02:39 PM

Cheers Scottie maybe thats whats happening, i better go look :-)

I am not having the best of days today lol

In fact today has been everything you would want it to be for the bloke that ran over your pet :-(

But Being Welsh i just pick myself up and get on with it :-)

#8 Merkere

Merkere

    HR 1

  • Members
  • Pip
  • 3 posts

Posted 22 September 2003 - 02:44 PM

I can't thank you all enough for your time and expertise! The more I think about it, the more the redirect issue makes sense. Google already has www.natinst.com (and, apparently, natinst.com) indexed and, since we were only using a temporary redirect, Google found ni.com/matrixx but interpreted it as the natinst.com url.

I'm working with my team to have this redirect changed and will let you know if it works.

Cheers to you all! ;)

#9 dragonlady7

dragonlady7

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 618 posts
  • Location:Buffalo, NY

Posted 22 September 2003 - 03:49 PM

I heard something about Google spidering pages that aren't linked to from anywhere, having found the URL by people using its toolbar. Is that a possibility as well?
Just wondering, as I don't really understand how the toolbar works.

#10 Ron Carnell

Ron Carnell

    HR 6

  • Moderator
  • 959 posts
  • Location:Michigan USA

Posted 22 September 2003 - 04:00 PM

Just as an aside, I hope a reminder about the robots.txt file doesn't sound too condescending to anyone? It's the only sure-fire way to keep Googlebot at bay.

I often have several "test" sites floating around, and because most are interactive, there are often physical footprints when a visitor accidentally stumbles into one. And that happens ALL the time. No way can I assume one of those visitors won't leave an errant link somewhere and lead a spider to my door. The robots.txt file keeps me out of trouble.

#11 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,326 posts

Posted 22 September 2003 - 07:47 PM

Welcome, Merkere! :embarrassed:

Looks like you're getting some good advice, so I'll just slip off to the next thread...

Jill

#12 OldWelshGuy

OldWelshGuy

    Work is Fun

  • Moderator
  • 4,713 posts
  • Location:Neath, South Wales, UK

Posted 23 September 2003 - 02:51 AM

well said, Ron i decided that the robots.txt was the only way to control this and went down that route,

but good point Dragon lady.. Yes i DO have google toolbar, and yes i do have the advanced features turned on so it reports back.

(i am not paranoid about google/microsoft etc ) well not Google anyhow lol

so maybe that is how googlebot 'found' my pages.

#13 Tenyque

Tenyque

    HR 2

  • Active Members
  • PipPip
  • 42 posts
  • Location:Suttons Bay, MI

Posted 23 September 2003 - 08:21 AM

There's also been rumors that Google also spiders through their own referrer logs, though I also second the Toolbar perhaps leaking the URL.

#14 dragonlady7

dragonlady7

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 618 posts
  • Location:Buffalo, NY

Posted 24 September 2003 - 08:25 AM

I did notice in my own logs an interesting phenomenon-- I would get a searcher from Google, and then immediately afterward Google would spider that page. I remarked upon it, but have since been assured that Googlebot can't actually follow visitors, so it must be coincidence. And I looked through my logs again and indeed, it only happens twice out of hundreds of spider visits. So, it must be coincidence.
Still, it made me wonder how Google chooses to spider a page.

#15 Leann_Pass

Leann_Pass

    Internet Marketing Consultant

  • Active Members
  • PipPipPipPipPipPip
  • 671 posts
  • Location:Birmingham Alabama

Posted 24 September 2003 - 10:26 AM

I have noticed this same thing happening. Even a little photo site I put together strictly for my mother to see was spidered and actually showed up in google results!

I never looked at it by going through the google toolbar, and my mother does not even know what a toolbar is, so I know she didn't.

They will find ya no matter what! (Of course unless you are in a hurry for them to find you =) what a crazy thing!

What really bugged me about it was that on an google 'image' search pictures from this site which included my daughter in a swim suit on the beach! EEEEEK!!

Robots.txt is the only way to go, IMO, if you want to keep google paws off something.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users