Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Want To Disallow Subfolders But Not Pages


  • Please log in to reply
3 replies to this topic

#1 EddyGonzalez

EddyGonzalez

    HR 3

  • Active Members
  • PipPipPip
  • 86 posts

Posted 21 July 2010 - 04:14 PM

I have a set of hundreds of subfolders in the following format

/laptops/acer/netbook/red/
/laptops/acer/netbook/blue/
...

In Robots.txt I want to disallow all these subfolders but I want the pages i.e.

/laptops/acer/netbook/page1.html to be crawled

How can I do this without having to do:
User-agent: *
Disallow: /laptops/acer/netbook/black
Disallow: /laptops/acer/netbook/white
Disallow: /laptops/acer/netbook/fast
Disallow: /laptops/acer/netbook/slow
etc... hundreds of time?

thanks
Eddy

#2 Alan Perkins

Alan Perkins

    Token male admin

  • Admin
  • 1,642 posts
  • Location:UK

Posted 22 July 2010 - 05:36 AM

The best way to achieve what you want is to structure the URLs a little differently in order to make disallowing easier. Examples:
  • /ni/laptops/acer/netbook/red/ => Disallow: /ni/
  • /laptops/acer/netbook/red/?noindex => Disallow: /*noindex*

Failing that, I would do the hundreds of disallows or use something else like a combination of nofollow on links to the pages, and noindex on the pages themselves.

There is a risky way to do what you want, which is:

CODE
User-agent: Googlebot
Disallow: /laptops/acer/netbook/
Allow: /laptops/acer/netbook/*.html$


However, I can't vouch for this ... you'd have to try it and see.

#3 Michael Martinez

Michael Martinez

    HR 10

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 5,094 posts
  • Location:Georgia

Posted 22 July 2010 - 01:07 PM

QUOTE(EddyGonzalez @ Jul 21 2010, 02:14 PM) View Post
I have a set of hundreds of subfolders in the following format

/laptops/acer/netbook/red/
/laptops/acer/netbook/blue/
...

In Robots.txt I want to disallow all these subfolders but I want the pages i.e.

/laptops/acer/netbook/page1.html to be crawled


If you don't have index pages in those folders, you may be leaving your server to exposure at some level. At the very least, the user experience won't be very productive. I typically handle this by using HTTP meta refresh tags in blank index pages for the folders, rather than fiddle with .htaccess or robots.txt. That gives me the flexibility to add a full content index page later on without having to risk fudging up the vital crawl/access management files for the whole Website.

It's purely a matter of personal site management style. In my opinion, there is no one right way to handle the situation. But you definitely want to remember what incidental visitors may see if they hit those URLs.

#4 EddyGonzalez

EddyGonzalez

    HR 3

  • Active Members
  • PipPipPip
  • 86 posts

Posted 02 August 2010 - 10:50 AM

Thanks for the tips guys.

tried:
User-agent: Googlebot
Disallow: /laptops/acer/netbook/
Allow: /laptops/acer/netbook/*.html$

and it seems to work well but I appreciate your caution Michael.

eddy





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!