Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
Possible Robot Text File Problem
#1
Posted 19 February 2004 - 08:48 PM
I wanted to keep the bots out of the nascar_races dir except for the two php pages below, I have not optimized the rest of the pages, and there are a considerable number of them in there.
Google paid a visit a week ago and indexed approx 50 of these pages, not only that
but these 50 pages appear to have replaced 50 pages that had already been indexed, some were my top keyword pages.
Where should I go from here, are the lines below correct or should I just have a single line Disallow: /nascar_races/
Goggle has indexed approx 350 pages, I thought it would read and index them all
not start replacing pages that had been indexed.
There are no tricks on the site, hidden text etc.
Disallow: /nascar_races/driver-list-2003-finishes.php4
Disallow: /nascar_races/nascar-winston-nextel-cup-results-2003.php4
Thanks
donbmjr
#2
Posted 19 February 2004 - 09:18 PM
#3
Posted 19 February 2004 - 11:14 PM
If I'm reading your question correctly, the two files you mentioned are the only ones in the directory you want to be spidered. Correct?
If so, your robots.txt is doing exactly the opposite of what you want it to. It's Disallowing those two files, but allowing everything else in that directory.
Try this one instead:
User-agent: *
Allow: /nascar_races/driver-list-2003-finishes.php4
Allow: /nascar_races/nascar-winston-nextel-cup-results-2003.php4
Disallow: /nascar_races/
The above will tell all spiders that they're allowed to grab those two specific pages in the nascar_races directory, but everything else is off limits.
HTH
#4
Posted 20 February 2004 - 03:27 AM
User-agent: * Allow: /nascar_races/driver-list-2003-finishes.php4 Allow: /nascar_races/nascar-winston-nextel-cup-results-2003.php4 Disallow: /nascar_races/
Unfortunately, the Allow: line mentioned above is not a part of the standard
FYI, this site is the official source for the standard: www.robotstxt.org
and the only recognized lines (which may appear multiple times) are the "User-Agent" and "Disallow" lines.
As far as the example you gave, I assume that there is a User-Agent: * line above the Disallow lines. At least one User-Agent line is required. As configured, Randy is right, you are excluding only the two pages you mentioned, rather than allowing the two pages you mentioned. Note also that the robots.txt file must be placed in the domain root (i.e. http://somesite.com/robots.txt). There is no support for placing a robots.txt file in a subdir, having multiple robots.txt files, etc. There can be only one.
I would recommend that you move (leaving a 301 redirect behind) the two pages to which you want to allow access into a different subdir and implement the Disallow: line that searchrank suggested.
searchrank's suggestion to directly submit the two pages has some merit, but if the robot is abiding by the standard, it will request the robots.txt file first, and still not spider the directly-submitted pages.
Another option is to leave access to all the files open, and then add a robots META tag to each page to be excluded. I don't know which engines support this (SearchEngineWatch says "Most major engines support the meta robots tag"), but here is the format:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
This would need to be added to every page to be excluded (unless already excluded by the robots.txt file).
#5
Posted 20 February 2004 - 04:02 AM
Expanding on searchrank's reply, one further option is open. That option is to exclude each file in the nascar_races directory that you don't want to be indexed, leaving the files that you do.
So instead of
Disallow: /nascar_races/
Use this:
Disallow: /nascar_races/file1.php
Disallow: /nascar_races/file2.php
Disallow: /nascar_races/file3.php
Making sure that you don't have these lines:
Disallow: /nascar_races/driver-list-2003-finishes.php4
Disallow: /nascar_races/nascar-winston-nextel-cup-results-2003.php4
#6
Posted 20 February 2004 - 08:18 AM
#7
Posted 20 February 2004 - 08:34 AM
Easiest solution here is to take those two files out of the directory you want everything else banned from and put them somewhere else.
Then disallow that directory with a single line and go have a beer.
G.
#8
Posted 20 February 2004 - 09:09 AM
I believe for the time being I will go with the single line as Grumpus suggested
except for the beer.
Disallow: /nascar_races/
I do have a User-agent: * at the top
Just to be sure, am I correct in assuming that even if there is a link to a file
in this directory it will not get indexed, using just the single line above.
One other comment that has been made a few times before.
One of my best ranked pages running at #3 for a few months , a well written no funny stuff page dropped to #103 after the last google run, most of the pages before it are terrible.
Thanks again
donbmjr
#9
Posted 20 February 2004 - 12:10 PM
Short answer: yes.Just to be sure, am I correct in assuming that even if there is a link to a file
in this directory it will not get indexed, using just the single line above.
More complete answer: Once a URL is protected by robots.txt then it shouldn't be read. The URL may be indexed now and again, but the content won't be.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








