I have a set of hundreds of subfolders in the following format
/laptops/acer/netbook/red/
/laptops/acer/netbook/blue/
...
In Robots.txt I want to disallow all these subfolders but I want the pages i.e.
/laptops/acer/netbook/page1.html to be crawled
How can I do this without having to do:
User-agent: *
Disallow: /laptops/acer/netbook/black
Disallow: /laptops/acer/netbook/white
Disallow: /laptops/acer/netbook/fast
Disallow: /laptops/acer/netbook/slow
etc... hundreds of time?
thanks
Eddy
Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses
Want To Disallow Subfolders But Not Pages
Started by
EddyGonzalez
, Jul 21 2010 04:14 PM
3 replies to this topic
#1
Posted 21 July 2010 - 04:14 PM
#2
Posted 22 July 2010 - 05:36 AM
The best way to achieve what you want is to structure the URLs a little differently in order to make disallowing easier. Examples:
Failing that, I would do the hundreds of disallows or use something else like a combination of nofollow on links to the pages, and noindex on the pages themselves.
There is a risky way to do what you want, which is:
However, I can't vouch for this ... you'd have to try it and see.
- /ni/laptops/acer/netbook/red/ => Disallow: /ni/
- /laptops/acer/netbook/red/?noindex => Disallow: /*noindex*
Failing that, I would do the hundreds of disallows or use something else like a combination of nofollow on links to the pages, and noindex on the pages themselves.
There is a risky way to do what you want, which is:
CODE
User-agent: Googlebot
Disallow: /laptops/acer/netbook/
Allow: /laptops/acer/netbook/*.html$
Disallow: /laptops/acer/netbook/
Allow: /laptops/acer/netbook/*.html$
However, I can't vouch for this ... you'd have to try it and see.
#3
Posted 22 July 2010 - 01:07 PM
I have a set of hundreds of subfolders in the following format
/laptops/acer/netbook/red/
/laptops/acer/netbook/blue/
...
In Robots.txt I want to disallow all these subfolders but I want the pages i.e.
/laptops/acer/netbook/page1.html to be crawled
/laptops/acer/netbook/red/
/laptops/acer/netbook/blue/
...
In Robots.txt I want to disallow all these subfolders but I want the pages i.e.
/laptops/acer/netbook/page1.html to be crawled
If you don't have index pages in those folders, you may be leaving your server to exposure at some level. At the very least, the user experience won't be very productive. I typically handle this by using HTTP meta refresh tags in blank index pages for the folders, rather than fiddle with .htaccess or robots.txt. That gives me the flexibility to add a full content index page later on without having to risk fudging up the vital crawl/access management files for the whole Website.
It's purely a matter of personal site management style. In my opinion, there is no one right way to handle the situation. But you definitely want to remember what incidental visitors may see if they hit those URLs.
#4
Posted 02 August 2010 - 10:50 AM
Thanks for the tips guys.
tried:
User-agent: Googlebot
Disallow: /laptops/acer/netbook/
Allow: /laptops/acer/netbook/*.html$
and it seems to work well but I appreciate your caution Michael.
eddy
tried:
User-agent: Googlebot
Disallow: /laptops/acer/netbook/
Allow: /laptops/acer/netbook/*.html$
and it seems to work well but I appreciate your caution Michael.
eddy
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users









