Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Please Help With Robots.txt?


  • Please log in to reply
6 replies to this topic

#1 macmachi

macmachi

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 06 March 2012 - 06:37 AM

I need your help, how do I block search engines from crawling a particular directory within a directory?
For instance
We have our microsite in our main site. So the entire microsite is operating from our main domain...
example.com/abc
example.com/abc/index.php
"abc" is the directory to access the microsite.

Now I have a directory named newsite where we will put our mock up design.

It comes like this
example.com/abc/newsite

I don't want SE's to crawl and index this particular path, how do I do that?

I heard I can only use the robots.txt in the root directory level. So I created a robots.txt and put it in the root directory.

My question is should I use
user agent: *
Disallow: /newsite/

or
user agent: *
Disallow: abc /newsite/

?

Please help

Thank you so much!!!!

#2 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 7,100 posts
  • Location:Blackpool UK

Posted 06 March 2012 - 07:33 AM

Specify the full path starting from the site root.

#3 macmachi

macmachi

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 06 March 2012 - 09:26 AM

QUOTE(chrishirst @ Mar 6 2012, 07:33 AM) View Post
Specify the full path starting from the site root.


Thank you so much for replying!!!

Which is the way to specify the directory?

Disallow: abc/newsite/
or
Disallow: /abc/newsite/
or
Disallow: abc/newsite

?

Thanks again!

#4 Jill

Jill

    Recovering SEO

  • Admin
  • 33,003 posts

Posted 06 March 2012 - 09:33 AM

Disallow: /abc/newsite/

#5 macmachi

macmachi

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 06 March 2012 - 10:20 AM

Thank you so much Jill!!!!

It was really helpful, I almost fed up trying to find an answer to my question.

Cheers!



QUOTE(Jill @ Mar 6 2012, 09:33 AM) View Post
Disallow: /abc/newsite/



#6 Ron Carnell

Ron Carnell

    HR 6

  • Moderator
  • 966 posts
  • Location:Michigan USA

Posted 06 March 2012 - 11:14 AM

How do visitors access your microsite?

If one gets to your microsite with an URL like http:://example.com/abc/ then . . . you already have your answer. Good luck.

However, when you use words like "microsite" it raises questions. In situations like you've described, it's not uncommon to see URLs more along the lines of http://abc.example.com or even http://abc.com/ being used. Modern hosting services frequently allow subdomains to run as part of the main site, and some even allow other domains to do so. It saves paying for additional hosting packages.

If this is what you're doing, then you've only got a partial solution.

Search engine spiders visiting example.com won't go into the /abc folder. However, spiders that come to visit abc.example.com or abc.com won't even see your robots.txt file or the directive within in to disallow the /abc folder. Why? Because the file is in the root directory level of a different domain, the parent domain.

When you create a subdomain or domain as a folder within another domain, you need to put a robots.txt file in both root directories. The higher level robots.txt will be found when a spider visits example.com; it should include a directive to disallow the entire /abc folder, i.e., Disallow: /abc/. You want to ALWAYS do this to avoid duplicate content. If you don't, the content will potentially be indexed under both example.com and abc.example.com.

The secondary level robots.txt will sit in /abc/ (which is the root directory level of abc.example.com). This file, unlike the higher level one, will only be found by spiders visiting abc.example.com. It should include any specific folders within /abc/ that you want to disallow. Using your own example, this file should contain the line Disallow: /newsite/ to prevent indexing of that folder.

Again, if you're not using a subdomain to access your so-called microsite you can safely ignore everything I've said. Chris and Jill already gave you the right answer. If, however, you're getting creative at the DNS level you'll have to be equally creative at the Robot Exclusion level. Essentially, you need a robots.txt file for EACH domain that will be involved.

#7 macmachi

macmachi

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 06 March 2012 - 11:57 AM

@Ron Carnell

That is the kind of knowledge I am happy to have from experienced person like yourself!

Thank you for taking you time to explain these things clearly. I was trying to rep you, but I couldn't find any link to do so smile.gif

Once again, Thank you so much!

BTW, it is not a sub domain but a directory.


Best,

QUOTE(Ron Carnell @ Mar 6 2012, 11:14 AM) View Post
How do visitors access your microsite?

If one gets to your microsite with an URL like http:://example.com/abc/ then . . . you already have your answer. Good luck.

However, when you use words like "microsite" it raises questions. In situations like you've described, it's not uncommon to see URLs more along the lines of http://abc.example.com or even http://abc.com/ being used. Modern hosting services frequently allow subdomains to run as part of the main site, and some even allow other domains to do so. It saves paying for additional hosting packages.

If this is what you're doing, then you've only got a partial solution.

Search engine spiders visiting example.com won't go into the /abc folder. However, spiders that come to visit abc.example.com or abc.com won't even see your robots.txt file or the directive within in to disallow the /abc folder. Why? Because the file is in the root directory level of a different domain, the parent domain.

When you create a subdomain or domain as a folder within another domain, you need to put a robots.txt file in both root directories. The higher level robots.txt will be found when a spider visits example.com; it should include a directive to disallow the entire /abc folder, i.e., Disallow: /abc/. You want to ALWAYS do this to avoid duplicate content. If you don't, the content will potentially be indexed under both example.com and abc.example.com.

The secondary level robots.txt will sit in /abc/ (which is the root directory level of abc.example.com). This file, unlike the higher level one, will only be found by spiders visiting abc.example.com. It should include any specific folders within /abc/ that you want to disallow. Using your own example, this file should contain the line Disallow: /newsite/ to prevent indexing of that folder.

Again, if you're not using a subdomain to access your so-called microsite you can safely ignore everything I've said. Chris and Jill already gave you the right answer. If, however, you're getting creative at the DNS level you'll have to be equally creative at the Robot Exclusion level. Essentially, you need a robots.txt file for EACH domain that will be involved.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!