SEO Class in Chicago, IL
Learn How To Optimize Your Website on July 26, 2013
High Rankings is offering a 1-day customized SEO training class in Chicago. Class size is limited so please sign-up now if you want in!
Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
How Did Google Find This Url?!
#1
Posted 22 September 2003 - 12:16 PM
Here's the research I've done:
1) I know ni.com/matrixx is only partially indexed because I've done the allinurl:ni.com/matrixx search on Google and see only the URL and not the description.
2) According to a Google link search (link:ni.com/matrixx) 11 pages link to the ni.com URL.
3) The same Google link search for natinst.com/matrixx shows the same 11 pages, though all of them truly link to ni.com/matrixx. No page links to the natinst.com URL.
Now, my understanding is that Google finds URLs by crawling pages and following those links. How on earth could Google then have found the natinst.com URL if no one links to it, and not the ni.com URL, which is linked to by several pages?
Any theories or insight would be greatly appreciated.
Many thanks!
Merkere
#2
Posted 22 September 2003 - 12:24 PM
I have been a bit overwhelmed by the power of google to follow non existent links.
I have decided that google can spider your actual web files, as there is no way on Gods earth could it have found some of the pages it has spidered on my server.
example: when i am building a site i will allocate a folder on my server for the client to 'view' the ongoing work, (i find this stops them telephoning me to find out how it is going) Now regularly google has pulled these pages into its index, there is NO LINK to these pages from anywhere as only myself and the customer knows the link, the only link would have been in the email i send the customer, so unless Google bot can somehow get to that, i am assuming that it can traverse my server files (a bit worrying lol).
I have even had test files indexed :-( where not even an email link has been sent out.
#3
Posted 22 September 2003 - 12:38 PM
Let me ask you this - is ni.com a new site?
Also you say natinst.com has been around for awhile so could it be that it has already been in Google's index for awhile and that is why it is still showing?
Furthermore, I noticed that natinst.com redirects to ni.com. What kind of a redirect do you have set up as they both point to different IP addresses.
ni.com - 130.164.140.26
natinst.com - 130.164.140.14
Google would see these as different sites as they have different IP addresses but yet with mirrored content. Maybe if natinst.com is pointed to 130.164.140.26 it would eventually be dropped from Google's index?
Anyone else have an opinion?
#4
Posted 22 September 2003 - 02:19 PM
#5
Posted 22 September 2003 - 02:21 PM
Let me ask you this - is ni.com a new site?
No it has been around for several years, but we had natinst.com before then.
Also you say natinst.com has been around for awhile so could it be that it has already been in Google's index for awhile and that is why it is still showing?
natinst.com was the original site, but we changed to ni.com when that URL became available.
Furthermore, I noticed that natinst.com redirects to ni.com. What kind of a redirect do you have set up as they both point to different IP addresses.
I thought it was a permanent (301) redirect, but I checked to be sure, and, well, it's a 302! That could be part of the problem. However, ni.com/matrixx and natinst.com/matrixx are very, very new URLs. We just picked up the MATRIXx product a few months ago, so that URL did not exist when Google would have indexed the natinst.com site.
#6
Posted 22 September 2003 - 02:30 PM
Here's my pet theory-
Since you mentioned talking an old domain I checked out how Google sees your homepage. When doing a search for:
ni.com
www.ni.com
www.natinst.com
I get the same result, all pointing appropriately at www.ni.com. Whereas when I search on this:
natinst.com
I get something different. A (very) small fragment/mirror site that Google currently thinks exist, when it really doesn't. You can verify that Google thinks it has at least two copies of your site by clicking on the cache of these four searches and looking up in the URL. The first three show a DocId of "bOjJYLMF7rQJ", with the last one having "fuW-jigHpFIJ" - not to mention pointing at nainst.com.
This is where the theory gets full of holes. I'm guessing that a spider came by, looking to update the (fragment) nainst.com site and picked up the link from one of the pages on the www.ni.com site, which Googlebot thinks is really on the nainst.com site. This in spite of the 302 redirect.
I would think that converting your redirection process over to 301's as opposed to 302's would eradicate the imaginary fragment site Google thinks exist and probably get rid of this behavior. Once in a great while Google believes links more than server codes and internal absolute links, especially if there's a DMOZ entry involved, though can be fixed with 301's and 404's.
#7
Posted 22 September 2003 - 02:39 PM
I am not having the best of days today lol
In fact today has been everything you would want it to be for the bloke that ran over your pet :-(
But Being Welsh i just pick myself up and get on with it :-)
#8
Posted 22 September 2003 - 02:44 PM
I'm working with my team to have this redirect changed and will let you know if it works.
Cheers to you all!
#9
Posted 22 September 2003 - 03:49 PM
Just wondering, as I don't really understand how the toolbar works.
#10
Posted 22 September 2003 - 04:00 PM
I often have several "test" sites floating around, and because most are interactive, there are often physical footprints when a visitor accidentally stumbles into one. And that happens ALL the time. No way can I assume one of those visitors won't leave an errant link somewhere and lead a spider to my door. The robots.txt file keeps me out of trouble.
#11
Posted 22 September 2003 - 07:47 PM
Looks like you're getting some good advice, so I'll just slip off to the next thread...
Jill
#12
Posted 23 September 2003 - 02:51 AM
but good point Dragon lady.. Yes i DO have google toolbar, and yes i do have the advanced features turned on so it reports back.
(i am not paranoid about google/microsoft etc ) well not Google anyhow lol
so maybe that is how googlebot 'found' my pages.
#13
Posted 23 September 2003 - 08:21 AM
#14
Posted 24 September 2003 - 08:25 AM
Still, it made me wonder how Google chooses to spider a page.
#15
Posted 24 September 2003 - 10:26 AM
I never looked at it by going through the google toolbar, and my mother does not even know what a toolbar is, so I know she didn't.
They will find ya no matter what! (Of course unless you are in a hurry for them to find you =) what a crazy thing!
What really bugged me about it was that on an google 'image' search pictures from this site which included my daughter in a swim suit on the beach! EEEEEK!!
Robots.txt is the only way to go, IMO, if you want to keep google paws off something.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








