Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo

Robots.txt


  • Please log in to reply
7 replies to this topic

#1 Gatorhardware

Gatorhardware

    Web Hoster

  • Active Members
  • PipPipPipPipPip
  • 315 posts

Posted 16 April 2004 - 01:06 PM

I have been going over our server logs for the last few days and have found that google, bcentral(msn???), and another crawler first ask for the robot.txt file. I have read some about them but never saw a need for it. Should I make one and leave it blank or leave it out?

#2 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,312 posts

Posted 16 April 2004 - 01:10 PM

It's nice to have one, but it shouldn't hurt you not to have it.

Jill

#3 qwerty

qwerty

    HR 10

  • Moderator
  • 8,287 posts
  • Location:Somerville, MA

Posted 16 April 2004 - 01:13 PM

If you don't need to keep spiders out of any areas of the site, but you upload a blank robots file anyway, it at least means less 404 errors. I think it's worth having just for that, especially if you use a free stats program that only shows the top ten files for any given category. If you've got 11 URLs that are causing a 404 error and robots.txt is one of the top 10, you don't get to see what the 11th one was.

#4 Gatorhardware

Gatorhardware

    Web Hoster

  • Active Members
  • PipPipPipPipPip
  • 315 posts

Posted 16 April 2004 - 01:17 PM

Okay sounds good. Just wondered if the crawlers were effected by the 404 error.

#5 Gatorhardware

Gatorhardware

    Web Hoster

  • Active Members
  • PipPipPipPipPip
  • 315 posts

Posted 16 April 2004 - 01:18 PM

Also is it common to get crawled by several SE everyday? Not that I am throwing a fit. :rant:

#6 qwerty

qwerty

    HR 10

  • Moderator
  • 8,287 posts
  • Location:Somerville, MA

Posted 16 April 2004 - 01:46 PM

If you've got a lot of pages and/or have given the spiders reason to come back to the same page often (such as regularly updating it) then yes, it's quite common.

#7 rohgan03

rohgan03

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 944 posts

Posted 16 April 2004 - 02:08 PM

You can have a robots.txt that permits all pages to get indexed.
Robots.txt is generally used to prevetn certain pages to get crawled n indexed.

#8 kristof

kristof

    HR 2

  • Active Members
  • PipPip
  • 34 posts
  • Location:New England

Posted 17 April 2004 - 12:26 AM

Well if you go to http://www.robotstxt.org ... the instructions seem to imply that yes, you should have a robots.txt even if it is just blank.

However, it is standard practice to exclude your Picture file. This is because, the robots surfing your pictures may consume a great deal of bandwidth at your expense. And all you get out of it is the possibility that these robot reports are used by some "people" to make huge databases of images to steal at will.

Or at least, that was what I was taught at the vBulletin forum. Anyway, I exclude my picture files and that is all I ever can see any need to exclude.

Oh....another reason though, is if you have PDF files or such, intended only for download, and that duplicate the standard HTML content of your site. Definitely, you want to exclude those, so they are not mistaken for "mirror pages."

(This of course does not apply to PDF files that do not have duplicate content and that you WANT to be spidered.)

Edited by kristof, 17 April 2004 - 12:38 AM.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users