I need your help, how do I block search engines from crawling a particular directory within a directory?
For instance
We have our microsite in our main site. So the entire microsite is operating from our main domain...
example.com/abc
example.com/abc/index.php
"abc" is the directory to access the microsite.
Now I have a directory named newsite where we will put our mock up design.
It comes like this
example.com/abc/newsite
I don't want SE's to crawl and index this particular path, how do I do that?
I heard I can only use the robots.txt in the root directory level. So I created a robots.txt and put it in the root directory.
My question is should I use
user agent: *
Disallow: /newsite/
or
user agent: *
Disallow: abc /newsite/
?
Please help
Thank you so much!!!!
Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses
Please Help With Robots.txt?
Started by
macmachi
, Mar 06 2012 06:37 AM
6 replies to this topic
#1
Posted 06 March 2012 - 06:37 AM
#2
Posted 06 March 2012 - 07:33 AM
Specify the full path starting from the site root.
#4
Posted 06 March 2012 - 09:33 AM
Disallow: /abc/newsite/
#6
Posted 06 March 2012 - 11:14 AM
How do visitors access your microsite?
If one gets to your microsite with an URL like http:://example.com/abc/ then . . . you already have your answer. Good luck.
However, when you use words like "microsite" it raises questions. In situations like you've described, it's not uncommon to see URLs more along the lines of http://abc.example.com or even http://abc.com/ being used. Modern hosting services frequently allow subdomains to run as part of the main site, and some even allow other domains to do so. It saves paying for additional hosting packages.
If this is what you're doing, then you've only got a partial solution.
Search engine spiders visiting example.com won't go into the /abc folder. However, spiders that come to visit abc.example.com or abc.com won't even see your robots.txt file or the directive within in to disallow the /abc folder. Why? Because the file is in the root directory level of a different domain, the parent domain.
When you create a subdomain or domain as a folder within another domain, you need to put a robots.txt file in both root directories. The higher level robots.txt will be found when a spider visits example.com; it should include a directive to disallow the entire /abc folder, i.e., Disallow: /abc/. You want to ALWAYS do this to avoid duplicate content. If you don't, the content will potentially be indexed under both example.com and abc.example.com.
The secondary level robots.txt will sit in /abc/ (which is the root directory level of abc.example.com). This file, unlike the higher level one, will only be found by spiders visiting abc.example.com. It should include any specific folders within /abc/ that you want to disallow. Using your own example, this file should contain the line Disallow: /newsite/ to prevent indexing of that folder.
Again, if you're not using a subdomain to access your so-called microsite you can safely ignore everything I've said. Chris and Jill already gave you the right answer. If, however, you're getting creative at the DNS level you'll have to be equally creative at the Robot Exclusion level. Essentially, you need a robots.txt file for EACH domain that will be involved.
If one gets to your microsite with an URL like http:://example.com/abc/ then . . . you already have your answer. Good luck.
However, when you use words like "microsite" it raises questions. In situations like you've described, it's not uncommon to see URLs more along the lines of http://abc.example.com or even http://abc.com/ being used. Modern hosting services frequently allow subdomains to run as part of the main site, and some even allow other domains to do so. It saves paying for additional hosting packages.
If this is what you're doing, then you've only got a partial solution.
Search engine spiders visiting example.com won't go into the /abc folder. However, spiders that come to visit abc.example.com or abc.com won't even see your robots.txt file or the directive within in to disallow the /abc folder. Why? Because the file is in the root directory level of a different domain, the parent domain.
When you create a subdomain or domain as a folder within another domain, you need to put a robots.txt file in both root directories. The higher level robots.txt will be found when a spider visits example.com; it should include a directive to disallow the entire /abc folder, i.e., Disallow: /abc/. You want to ALWAYS do this to avoid duplicate content. If you don't, the content will potentially be indexed under both example.com and abc.example.com.
The secondary level robots.txt will sit in /abc/ (which is the root directory level of abc.example.com). This file, unlike the higher level one, will only be found by spiders visiting abc.example.com. It should include any specific folders within /abc/ that you want to disallow. Using your own example, this file should contain the line Disallow: /newsite/ to prevent indexing of that folder.
Again, if you're not using a subdomain to access your so-called microsite you can safely ignore everything I've said. Chris and Jill already gave you the right answer. If, however, you're getting creative at the DNS level you'll have to be equally creative at the Robot Exclusion level. Essentially, you need a robots.txt file for EACH domain that will be involved.
#7
Posted 06 March 2012 - 11:57 AM
@Ron Carnell
That is the kind of knowledge I am happy to have from experienced person like yourself!
Thank you for taking you time to explain these things clearly. I was trying to rep you, but I couldn't find any link to do so
Once again, Thank you so much!
BTW, it is not a sub domain but a directory.
Best,
That is the kind of knowledge I am happy to have from experienced person like yourself!
Thank you for taking you time to explain these things clearly. I was trying to rep you, but I couldn't find any link to do so
Once again, Thank you so much!
BTW, it is not a sub domain but a directory.
Best,
How do visitors access your microsite?
If one gets to your microsite with an URL like http:://example.com/abc/ then . . . you already have your answer. Good luck.
However, when you use words like "microsite" it raises questions. In situations like you've described, it's not uncommon to see URLs more along the lines of http://abc.example.com or even http://abc.com/ being used. Modern hosting services frequently allow subdomains to run as part of the main site, and some even allow other domains to do so. It saves paying for additional hosting packages.
If this is what you're doing, then you've only got a partial solution.
Search engine spiders visiting example.com won't go into the /abc folder. However, spiders that come to visit abc.example.com or abc.com won't even see your robots.txt file or the directive within in to disallow the /abc folder. Why? Because the file is in the root directory level of a different domain, the parent domain.
When you create a subdomain or domain as a folder within another domain, you need to put a robots.txt file in both root directories. The higher level robots.txt will be found when a spider visits example.com; it should include a directive to disallow the entire /abc folder, i.e., Disallow: /abc/. You want to ALWAYS do this to avoid duplicate content. If you don't, the content will potentially be indexed under both example.com and abc.example.com.
The secondary level robots.txt will sit in /abc/ (which is the root directory level of abc.example.com). This file, unlike the higher level one, will only be found by spiders visiting abc.example.com. It should include any specific folders within /abc/ that you want to disallow. Using your own example, this file should contain the line Disallow: /newsite/ to prevent indexing of that folder.
Again, if you're not using a subdomain to access your so-called microsite you can safely ignore everything I've said. Chris and Jill already gave you the right answer. If, however, you're getting creative at the DNS level you'll have to be equally creative at the Robot Exclusion level. Essentially, you need a robots.txt file for EACH domain that will be involved.
If one gets to your microsite with an URL like http:://example.com/abc/ then . . . you already have your answer. Good luck.
However, when you use words like "microsite" it raises questions. In situations like you've described, it's not uncommon to see URLs more along the lines of http://abc.example.com or even http://abc.com/ being used. Modern hosting services frequently allow subdomains to run as part of the main site, and some even allow other domains to do so. It saves paying for additional hosting packages.
If this is what you're doing, then you've only got a partial solution.
Search engine spiders visiting example.com won't go into the /abc folder. However, spiders that come to visit abc.example.com or abc.com won't even see your robots.txt file or the directive within in to disallow the /abc folder. Why? Because the file is in the root directory level of a different domain, the parent domain.
When you create a subdomain or domain as a folder within another domain, you need to put a robots.txt file in both root directories. The higher level robots.txt will be found when a spider visits example.com; it should include a directive to disallow the entire /abc folder, i.e., Disallow: /abc/. You want to ALWAYS do this to avoid duplicate content. If you don't, the content will potentially be indexed under both example.com and abc.example.com.
The secondary level robots.txt will sit in /abc/ (which is the root directory level of abc.example.com). This file, unlike the higher level one, will only be found by spiders visiting abc.example.com. It should include any specific folders within /abc/ that you want to disallow. Using your own example, this file should contain the line Disallow: /newsite/ to prevent indexing of that folder.
Again, if you're not using a subdomain to access your so-called microsite you can safely ignore everything I've said. Chris and Jill already gave you the right answer. If, however, you're getting creative at the DNS level you'll have to be equally creative at the Robot Exclusion level. Essentially, you need a robots.txt file for EACH domain that will be involved.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








