Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo

Maybe A Robots.txt Problem?


  • Please log in to reply
10 replies to this topic

#1 Katok

Katok

    HR 1

  • Members
  • Pip
  • 9 posts

Posted 30 November 2004 - 04:22 PM

Hi everybody.

I got a fresh site, which is not indexed yet, and I use the robots.txt file to exclude the directory with cgi-bin scripts from being indexed at all. Yesterday I looked thru the server logs with awstats and lo and behold - Googlebot came along.

Only what troubles me is that the bot only asked for the home page and the robots.txt - no other page got crawled so far, although I have a link to the site map on every page.

Is that normal for the G-bot? Or am I doing smth terribly wrong? I just need an early warning. I'm starting to think that maybe I shouldn't have used robots.txt at all... Is that theoretically possible that because cgi-bin is not accessible, G-bot cannot follow the links from the home page? I previously used robots.txt in a similar situation successfully.

My robots.txt goes as following:

User-agent: *
Disallow: /cgi-bin/

Thanks,
Katok

#2 Connie

Connie

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 428 posts
  • Location:The Hills of Missouri

Posted 30 November 2004 - 04:38 PM

your robots.txt file is not hurting you. I think what you described is fairly normal for a new site. My understanding is that Google uses one bot to find sites and others to actually crawl them.

#3 Renagade Master

Renagade Master

    HR 4

  • Active Members
  • PipPipPipPip
  • 137 posts
  • Location:London, UK

Posted 30 November 2004 - 04:43 PM

The way I understand it, is that a bot finds a sites and reports it back to the mothership(!) for parsing later. So maybe you will see more activity soon.

#4 Katok

Katok

    HR 1

  • Members
  • Pip
  • 9 posts

Posted 30 November 2004 - 05:35 PM

Thanks guys wink.gif

So I'll just sit there and wait for Google to come index it all.

#5 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 30 November 2004 - 08:21 PM

What you're seeing is completely normal.

Keep buiding new content, or if that's already in good shape keep getting more links pointing to your site. Now is the time to do either/both of those as it will pay big dividends in a month or two.

#6 Scottie

Scottie

    Psycho Mom

  • Admin
  • 6,293 posts
  • Location:Columbia, SC

Posted 30 November 2004 - 08:58 PM

Once it is indexed, expect a 4-6 month delay before it ranks well for anything. There seems to be a time delay now built in to Google's also- I'd say it is to combat the mini-nets and other link pop scams.

#7 lorax

lorax

    HR 2

  • Members
  • PipPip
  • 19 posts
  • Location:Vermont

Posted 30 November 2004 - 09:23 PM

>> There seems to be a time delay now built in to Google's algo

aka The Sandbox

I've seen googlebot query a site's robot.txt file and/or index file a dozen times or more before it finally crawled something else.

#8 Scottie

Scottie

    Psycho Mom

  • Admin
  • 6,293 posts
  • Location:Columbia, SC

Posted 30 November 2004 - 09:32 PM

Not the same thing as the sandbox, IMO.

The sandbox is where the hundreds of links you've acquired for a new site are sent to for a period of time, not adding anything to your PR.

The time delay filter affects all new sites as far as I can tell- no matter how many or how few links there are.

#9 lorax

lorax

    HR 2

  • Members
  • PipPip
  • 19 posts
  • Location:Vermont

Posted 30 November 2004 - 10:28 PM

I believe you are right in saying that the value of links 'placed on probation' is part of the Sandbox. But so too is the delay or dissapearance of a site from the SERPs.

QUOTE
Some programmers have viewed Google as uncomfortable to rank newer websites until they have proven their viability to exist for more than a period of "x" months. Thus the term "Sandbox Effect" applies to the idea that all new websites have their ratings placed in a holding tank until such time is deemed appropriate before a ranking can commence.
http://www.webpronew...dboxEffect.html

The fact that the site is new is a key ingredient as is the number of backlinks acquired and the time over which they were acquired. In rereading the original post I don't think Katok's site is in the Sandbox. Sounds more like it just hasn't been indexed yet.

FWIW. I've listened to Matt Cutts (engineer) of Google on several occassions and the underlying message I hear is that if looks like spam, smells like spam, and acts like spam, Google will treat it as spam until they're sure it isn't. So if your website has meteoric success with links, SERP position, and is relatively new - you could very well be in for a time-out.

#10 Scottie

Scottie

    Psycho Mom

  • Admin
  • 6,293 posts
  • Location:Columbia, SC

Posted 30 November 2004 - 10:44 PM

I don't like lumping them together, regardless of what the venerable WPN has to say on the matter. wink.gif

The sandbox is discussed as a penalty sort of situation; the time delay for new sites is unavoidable no matter how you promote it from what I've seen.

I think there are ways to avoid having your links sandboxed, but as far as I know there is no magic bullet to avoid the rankings delay for a new site.

#11 Alan Perkins

Alan Perkins

    Token male admin

  • Admin
  • 1,559 posts
  • Location:UK

Posted 01 December 2004 - 08:41 AM

QUOTE(Katok @ Nov 30 2004, 09:22 PM)
Yesterday I looked thru the server logs with awstats and lo and behold - Googlebot came along.

Only what troubles me is that the bot only asked for the home page and the robots.txt - no other page got crawled so far, although I have a link to the site map on every page.

Is that normal for the G-bot?

As others have said, it's normal. To be extra safe, though, make sure that the HTTP response to Googlebot is 200 (OK), rather than something nasty like a 3xx, 4xx or 5xx series response.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users