Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo

How To Format Robots.txt


  • Please log in to reply
6 replies to this topic

#1 ladybird

ladybird

    HR 3

  • Active Members
  • PipPipPip
  • 90 posts

Posted 20 December 2004 - 10:37 AM

Hi There biggrin.gif

I've been reading on this forum that robots.txt files are a good way to stop SEs from spidering certain files on a web site.

If I want to stop ALL spiders from crawling say XYZ.htm and ABC.htm do I set up the code like this:

User-Agent: *
Disallow: XYZ.htm

User-Agent: *
Disallow: ABC.htm

OR like this:

User-Agent: *
Disallow: XYZ.htm
Disallow: ABC.htm

Thanks in advance

bye1.gif

#2 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 5,882 posts
  • Location:Blackpool UK

Posted 20 December 2004 - 10:48 AM

the format should be

User-Agent: *
Disallow: /XYZ.htm
Disallow: /ABC.htm


the leading "/" is important but any of them are correct provided the "/" is added.

#3 Tom Philo

Tom Philo

    Photographer

  • Active Members
  • PipPipPipPipPip
  • 507 posts
  • Location:Beaverton, Oregon

Posted 20 December 2004 - 11:31 AM

robots.txt lives in a DOS oriented world - the / tells it to start at the root of the web server and start matching the pattern from the root level of the site using what comes after the / to be the pattern to be excluded.

/images/*

would exclude all items in images directory being indexed

/_vti*

would exclude all files (and directories) that start with _vti from being indexed anywhere in the site

/file.htm would only exclude that single file.htm from being indexed

#4 Alan Perkins

Alan Perkins

    Token male admin

  • Admin
  • 1,559 posts
  • Location:UK

Posted 20 December 2004 - 11:48 AM

QUOTE(taphilo @ Dec 20 2004, 04:31 PM)
/images/*

would exclude all items in images directory being indexed

The * is Google specific and not part of the robots.txt standard.

In fact, just

Disallow: /images/

would exclude all items in the /images/ directory. Similarly,

Disallow: /_vti*

is non-standard, and

Disallow: /_vti

would exclude all files (and directories) that start with _vti from being indexed, not anywhere in the site but at the root level of the site. So /blah/_vti_xyz would still be indexed.

QUOTE
/file.htm would only exclude that single file.htm from being indexed
Technically, it would prevent any file or directory whose URL began "/file.htm" from being indexed. So, for example, this would also prevent "/file.html" being indexed.

#5 Tom Philo

Tom Philo

    Photographer

  • Active Members
  • PipPipPipPipPip
  • 507 posts
  • Location:Beaverton, Oregon

Posted 20 December 2004 - 07:29 PM

got caught in my DOS world by adding in the * . . . embarrassed.gif

#6 Alan Perkins

Alan Perkins

    Token male admin

  • Admin
  • 1,559 posts
  • Location:UK

Posted 21 December 2004 - 01:34 PM

thumbup1.gif

#7 ladybird

ladybird

    HR 3

  • Active Members
  • PipPipPip
  • 90 posts

Posted 22 December 2004 - 11:28 AM

Thanks guys- you're the best! thumbup1.gif




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users