Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo

Controlling Robots.txt To Ignore Imaginary Folders


  • Please log in to reply
3 replies to this topic

#1 MADTIES

MADTIES

    HR 2

  • Members
  • PipPip
  • 17 posts

Posted 25 March 2004 - 09:21 AM

Our site is structured in such a way that partners/affiliates have imaginary virtual folders i.e. url/partner/page.htm where our system will automatically identify the partner and set various session variables.

What I need to know is whether the robots.txt file can be used to stop spiders recording links in these partner folders as ultimately the page will redirect to just url/page.htm and could trigger some duplicate content warnings as the partner page would have the same content as our standard page.

Would Disallow: /partner/ in the robots file work in this situation or only just for real folders?

Thanks

MadTies

#2 Alan Perkins

Alan Perkins

    Token male admin

  • Admin
  • 1,559 posts
  • Location:UK

Posted 25 March 2004 - 09:31 AM

Would Disallow: /partner/ in the robots file work in this situation or only just for real folders?

It would work in this situation, assuming that when you say "url/partner/page.htm " you mean "domain/partner/page.htm", i.e. that "partner" is a top level directory.

#3 bobsledbob

bobsledbob

    HR 3

  • Active Members
  • PipPipPip
  • 102 posts
  • Location:Ogden, Utah, USA

Posted 25 March 2004 - 12:48 PM

I think what he means is that "partner" would be replaced by the actual partner identifier, such that it's different for each affiliate.

Disallowing /url/ would work.

If you can't restrict that far up the directory structure, then you'll need to move your partner level down some. ie.

/url/partners/<partner>/page.htm

Then disallow /url/partners/ in your robots.txt

#4 domokun

domokun

    Web jockey

  • Active Members
  • PipPipPipPip
  • 249 posts

Posted 26 March 2004 - 03:09 AM

an alternative would be to have

url/external/partner/page

then disallow robots access to

url/external

:propeller:




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users