Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



SEO Class in Chicago, IL

Learn How To Optimize Your Website on July 26, 2013


Looking for personalized in-depth SEO training among your peers?



High Rankings is offering a 1-day customized SEO training class in Chicago. Class size is limited so please sign-up now if you want in!



 


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

How Search Engines Find Your Site


  • Please log in to reply
8 replies to this topic

#1 t49

t49

    HR 4

  • Active Members
  • PipPipPipPip
  • 128 posts
  • Location:Right of Centre

Posted 08 December 2003 - 08:25 PM

How do Search Engines find your site,Even 1st day online,and with NO incoming links?

Someone asked a question similar to this a while back,and received only a partial answer.
I was too busy/lazy to respond.

So why now? Kinda funny actually. 5 sites that I had only just put online on Saturday,where browsed by 3 SEs on Sunday night. I scratched my head for a while.Then I looked at the logs for another site,where I had put links to the new sites. No bots had visited in several days.Scratched some more ... I really should shower more often :lol: ... Then I remembered ... finally,like I only known this for 10 years,or so ... SEs don't need to know your domain name. If the server has been online for a while,the SEs know it's there and revisit once in a while to see what's new.

So,how do they find YOUR site,if they don't know it's name?
Easy. All websites are stored in directories/sub-directories,of a directory than is usually named www,or html. The SE bot/spider searches through all of the directories below that.It does not need to know the name of the directory.

Huh! ... ok,pretend that your website is in a big apartment building,only there are no names,or numbers on the door. ... the doors represent directories ... the bot/spider,knocks on every door ... if someone opens the door,and hands them a web page,it makes a copy of the web page for later analysis,and makes a note that there is a website there.

gonna stop typing now ... gotta scratch ...

Tom

#2 Ron Carnell

Ron Carnell

    HR 6

  • Moderator
  • 959 posts
  • Location:Michigan USA

Posted 08 December 2003 - 08:58 PM

Sorry, Tom, but I just don't believe that's correct. A spider is just a specialized web browser, and on a well configured machine, a browser should never have access to a directory listing. It will either get the default page returned or a Forbidden Access code. So maybe it only happens on poorly configured servers? Even if the server allowed such access, most directories would have a default page that would be returned instead. What you are describing is akin to FTP access and usually requires a username and password.

Robots follow links. If you have the Google toolbar loaded, there is some speculation that the robot might also follow YOU. And there's always the crazy chance that someone out there just typed in a domain name and got lucky.

#3 t49

t49

    HR 4

  • Active Members
  • PipPipPipPip
  • 128 posts
  • Location:Right of Centre

Posted 08 December 2003 - 10:46 PM

Sorry, Tom, but I just don't believe that's correct. A spider is just a specialized web browser, and on a well configured machine, a browser should never have access to a directory listing.



o.k. I simplified things a bit.
Yes,bots are browsers,and they do NOT have access to any directories above the main web site directory. Browsers/bots/spiders are directed to the main/default web site directory by the webserver. i.e. Apache

It will either get the default page returned or a Forbidden Access code. So maybe it only happens on poorly configured servers? Even if the server allowed such access, most directories would have a default page that would be returned instead.



All web site directories should have an index.html page,even if it is blank. This is one way of keeping people from reading your directory listings. i.e. list of pictures in your pic directory.


Robots follow links.



Robots do whatever they are programmed to do,and can easily go up,or down,through any directory that is not [ properly] restricted.

If you have the Google toolbar loaded, there is some speculation that the robot might also follow YOU.



Prefer not to use microcrap software,and toolbars are generally only available for crappy browsers.

And there's always the crazy chance that someone out there just typed in a domain name and got lucky.



All 5 sites are on the same machine,and where spidered in alphabetical order [ directory wise ],and not in the order in which the links appeared on another site.Also,the links,which where on another site,on another machine,in another country, where created before all of the directories,and there where no error codes in the logs.In other words,the bots did not look for non existent directories. Since they hadn't visited the site,that had a page with links to the 5 new sites, they where not looking for them.

If your webserver [ machine ] has been online for a while,and is constantly adding new sites,the bots know this,and will visit more often. ... maybe! ...

Or maybe I just got lucky and 'moved in' [ apt. bldg analogy again ] just before the bots came by to see whats new.

Tom

#4 Ron Carnell

Ron Carnell

    HR 6

  • Moderator
  • 959 posts
  • Location:Michigan USA

Posted 08 December 2003 - 11:12 PM

Robots do whatever they are programmed to do,and can easily go up,or down,through any directory that is not [ properly] restricted.

So, uh, are yours properly restricted? :lol:

#5 t49

t49

    HR 4

  • Active Members
  • PipPipPipPip
  • 128 posts
  • Location:Right of Centre

Posted 08 December 2003 - 11:26 PM

So, uh, are yours properly restricted? :cheers:

you don't really think that I'm going to answer that on a public forum do you? :lol:
Tom

#6 powerofeyes

powerofeyes

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 1,123 posts
  • Location:INDIA

Posted 09 December 2003 - 12:24 AM

Hello Tom,

If your webserver [ machine ] has been online for a while,and is constantly adding new sites,the bots know this,and will visit more often.

This is an impossible task, Usually Google jumps from pages to pages Using the HREF and other hyperlink elements,
A small quote from google webmaster info,

Google's robots jump from page to page on the Web via hyperlinks

We always have a possibility to think that we dont have link from any sites, even I had one site which was brand new it was not even hosted fully but the right next day googlebot has all the 30 pages of the site in its index, I was really puzzled how it got in and finally saw the link of this site where it was posted for review in a online coding site,
so it happens but we cannot say google will index a server on whole it need Hyperlinks to follow,

VIJAY,

#7 t49

t49

    HR 4

  • Active Members
  • PipPipPipPip
  • 128 posts
  • Location:Right of Centre

Posted 09 December 2003 - 03:52 AM

Hi Vijay!
Maybe that's the way Google does it,or lets us believe that,but I stand by what I said. A bot does what it is programmed to do,and does NOT need links to follow.All it needs is the IP address,or domain name of a machine,and away it goes.If your not careful,it will go on happily crashing servers around the world,until your hard drive is full. ... or your ISP shuts you down,and hands you a HUGE bill.Better find out how much they charge per gigabyte,before you start playing with those things.
There are quite a few sripts,and full blown search programs,out there that you can download and run yourself. It helps if you know a little Perl,or C,but isn't necessary. Lets keep our fingers crossed that the flakes on this forum don't download and run them,or there will be servers crashing all over the place. Flakes take note: SE bots leave a trail that will lead right back to your machine. Why am I warning them? The people that set these things lose on the net are too stupid to understand or care.
... and while they are being led away in cuffs,and chains,whimper 'but I said I was sorry.'

Sorry! I get carried away sometimes. :lol:

Tom

#8 OldWelshGuy

OldWelshGuy

    Work is Fun

  • Moderator
  • 4,713 posts
  • Location:Neath, South Wales, UK

Posted 09 December 2003 - 04:26 AM

Tom I have had similar experiences and it came down to my not password protecting my server log file :lol:

I was flummoxed as the pages it indexed were unfinished and had only been posted to my work in progress customer viewing section (not pasword protected) I saw no need as they are in such an obscure directory that no-one would have been able to find them.

In all cases i had not mentioned the domain name online, only by email between the customer and myself, I did however have the google toolbar installed.

So it could be either way to find them.

I now use passwords on unfinished WIP pages :halo:

#9 OldWelshGuy

OldWelshGuy

    Work is Fun

  • Moderator
  • 4,713 posts
  • Location:Neath, South Wales, UK

Posted 09 December 2003 - 04:31 AM

Your right Tom, I have a cgi search engine that will go off crawling as many links deep as i tell it per site, either ignoring or following external links etc. I am using it to build a specialist search resource so we currently have it on a leash internal links only, one site at a time, manually editing the results on each page.

But someone could crash a few servers with it if they trurned up the dial to 'go anywhere'.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users