Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
Maybe A Robots.txt Problem?
#1
Posted 30 November 2004 - 04:22 PM
I got a fresh site, which is not indexed yet, and I use the robots.txt file to exclude the directory with cgi-bin scripts from being indexed at all. Yesterday I looked thru the server logs with awstats and lo and behold - Googlebot came along.
Only what troubles me is that the bot only asked for the home page and the robots.txt - no other page got crawled so far, although I have a link to the site map on every page.
Is that normal for the G-bot? Or am I doing smth terribly wrong? I just need an early warning. I'm starting to think that maybe I shouldn't have used robots.txt at all... Is that theoretically possible that because cgi-bin is not accessible, G-bot cannot follow the links from the home page? I previously used robots.txt in a similar situation successfully.
My robots.txt goes as following:
User-agent: *
Disallow: /cgi-bin/
Thanks,
Katok
#2
Posted 30 November 2004 - 04:38 PM
#3
Posted 30 November 2004 - 04:43 PM
#4
Posted 30 November 2004 - 05:35 PM
So I'll just sit there and wait for Google to come index it all.
#5
Posted 30 November 2004 - 08:21 PM
Keep buiding new content, or if that's already in good shape keep getting more links pointing to your site. Now is the time to do either/both of those as it will pay big dividends in a month or two.
#6
Posted 30 November 2004 - 08:58 PM
#7
Posted 30 November 2004 - 09:23 PM
aka The Sandbox
I've seen googlebot query a site's robot.txt file and/or index file a dozen times or more before it finally crawled something else.
#8
Posted 30 November 2004 - 09:32 PM
The sandbox is where the hundreds of links you've acquired for a new site are sent to for a period of time, not adding anything to your PR.
The time delay filter affects all new sites as far as I can tell- no matter how many or how few links there are.
#9
Posted 30 November 2004 - 10:28 PM
The fact that the site is new is a key ingredient as is the number of backlinks acquired and the time over which they were acquired. In rereading the original post I don't think Katok's site is in the Sandbox. Sounds more like it just hasn't been indexed yet.
FWIW. I've listened to Matt Cutts (engineer) of Google on several occassions and the underlying message I hear is that if looks like spam, smells like spam, and acts like spam, Google will treat it as spam until they're sure it isn't. So if your website has meteoric success with links, SERP position, and is relatively new - you could very well be in for a time-out.
#10
Posted 30 November 2004 - 10:44 PM
The sandbox is discussed as a penalty sort of situation; the time delay for new sites is unavoidable no matter how you promote it from what I've seen.
I think there are ways to avoid having your links sandboxed, but as far as I know there is no magic bullet to avoid the rankings delay for a new site.
#11
Posted 01 December 2004 - 08:41 AM
Only what troubles me is that the bot only asked for the home page and the robots.txt - no other page got crawled so far, although I have a link to the site map on every page.
Is that normal for the G-bot?
As others have said, it's normal. To be extra safe, though, make sure that the HTTP response to Googlebot is 200 (OK), rather than something nasty like a 3xx, 4xx or 5xx series response.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users







