Hi everyone, I'm new here and this looks like a great resource. So glad I found the site. Onto my question and dilemma. It's a bit complex, so please bear with me.
I'm working on a client's site that has multiple domains (I know..a topic that's been beaten to death before)
Anyway, in this situation the client has a specific reason for wanting to keep multiple domains and each one will have regional specific content:
My question tonight is not about the pros/cons of running separate sites, but rather, if they have a platform setup (not indexed yet) that houses the entire system: ie: all the regional areas, and then individual domains that will only display the regional area I need to come up with a solution to have unique robots.txt files for each site.
So for example: site1 = all inclusive content from all regional areas
A. I know I want to no-index this all inclusive domain site1 because I don't want it to rank for any material. It is setup technologically wise to house the main system and I don't want this indexed at all
1. Do I noindex, and no follow allinclusive domain #1?
B. The other domains actually run off of the main website, but are setup to deliver only regional specific content. These I want indexed and followed, and there are no links from these sites back to the main site, as it is setup to deliver only links specific to that particular site.
2. How can I deliver a separate robots.txt file for the other websites when technically they are running the same code, but displaying different content based upon the domain requested (so in theory all content is different)
I was thinking I could probably do this with mod-rewrite but I'm not very good at that piece. Any other suggestions, or if mod-rewrite is the solution, could someone please help me with a sample code and some suggestions on how to setup the different robots.txt files so they are delivered appropriately?
Appreciate any tips you all can share to help me with this situation.
Thanks again.
Jessica
Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses
Request For Advice On Multiple Robots.txt For Subsites
Started by
JessicaK
, Jul 21 2008 07:47 PM
3 replies to this topic
#1
Posted 21 July 2008 - 07:47 PM
#2
Posted 21 July 2008 - 09:19 PM
Hello again all.
I think I found a solution but I may need to ask for some help with the rewrite rules for my particular situation.
Ok, here goes. I found some code on another site that I think will work just fine.
Here is the code, I may need to tweak it and I have no skill in mod-rewrite.
The person who posted this code was saying that "This basically ignores any requests for robots.txt via your main site, instead passing them to the real robots.txt file as they should be. If you try to access robots.txt via any other domain (or subdomain) the rule is activated, and you're served robots_blocked.txt instead."
Since I want to block everything on the client's main domain, I have setup a regular robots.txt to disallow everything. And a 2nd robots_allow.txt so I modified the line above in my situation to say
Since I obviously don't want to cause a problem on the client's site, do you all think that this syntax is correct for what I'm trying to do?
Ie: my regular robots.txt file blocks everything, whereas my robots_allow.txt allows for normal indexing
If a bot accesses the subdomain of the main domain or a separate domain that has been parked to the main domain will either of these instances be served robots_allow.txt?
Thanks everything, and apologize for having such a technical first post.
I think I found a solution but I may need to ask for some help with the rewrite rules for my particular situation.
Ok, here goes. I found some code on another site that I think will work just fine.
Here is the code, I may need to tweak it and I have no skill in mod-rewrite.
CODE
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www\.)?yourmainsite\.com$ [NC]
RewriteRule ^robots.txt$ /robots_blocked.txt [NC,L]
</IfModule>
RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www\.)?yourmainsite\.com$ [NC]
RewriteRule ^robots.txt$ /robots_blocked.txt [NC,L]
</IfModule>
The person who posted this code was saying that "This basically ignores any requests for robots.txt via your main site, instead passing them to the real robots.txt file as they should be. If you try to access robots.txt via any other domain (or subdomain) the rule is activated, and you're served robots_blocked.txt instead."
Since I want to block everything on the client's main domain, I have setup a regular robots.txt to disallow everything. And a 2nd robots_allow.txt so I modified the line above in my situation to say
CODE
RewriteRule ^robots.txt$ /robots_allow.txt [NC,L]
Since I obviously don't want to cause a problem on the client's site, do you all think that this syntax is correct for what I'm trying to do?
Ie: my regular robots.txt file blocks everything, whereas my robots_allow.txt allows for normal indexing
If a bot accesses the subdomain of the main domain or a separate domain that has been parked to the main domain will either of these instances be served robots_allow.txt?
Thanks everything, and apologize for having such a technical first post.
Edited by Randy, 22 July 2008 - 06:47 AM.
Added code tags
#3
Posted 22 July 2008 - 06:56 AM
Welcome Jessica ! 
You're on the right track, assuming of course your site is hosted on a Unix/Linux server that has mod_rewrite enabled. As usual, there are many ways to accomplish what you're looking to do. And the way you laid it out with the change to the rewriterule is certainly one way to do it. I'd personally do it a little differently, but it's a personal choice that simply makes more sense to me.
I'd use
The only real difference between my version and your original one is that I removed the ! or not regex from the beginning of the rewrite condition url address.
The reason I'd do it this way is because the rewrite condition would only test true and fire when the domain being requested was yourmaindiste.com, whereas in your tweaked example it would have to fire for every domain except the main domain. So in theory your rule would probably have to fire many times more often than mine would. Increasing server load.
In most cases it wouldn't make a huge difference, but I see no reason to add extra load to the server if one can avoid it.
You're on the right track, assuming of course your site is hosted on a Unix/Linux server that has mod_rewrite enabled. As usual, there are many ways to accomplish what you're looking to do. And the way you laid it out with the change to the rewriterule is certainly one way to do it. I'd personally do it a little differently, but it's a personal choice that simply makes more sense to me.
I'd use
CODE
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www\.)?yourmainsite\.com [NC]
RewriteRule ^robots.txt$ /robots_blocked.txt [NC,L]
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www\.)?yourmainsite\.com [NC]
RewriteRule ^robots.txt$ /robots_blocked.txt [NC,L]
The only real difference between my version and your original one is that I removed the ! or not regex from the beginning of the rewrite condition url address.
The reason I'd do it this way is because the rewrite condition would only test true and fire when the domain being requested was yourmaindiste.com, whereas in your tweaked example it would have to fire for every domain except the main domain. So in theory your rule would probably have to fire many times more often than mine would. Increasing server load.
In most cases it wouldn't make a huge difference, but I see no reason to add extra load to the server if one can avoid it.
#4
Posted 30 July 2008 - 11:13 AM
Hi Randy,
Thanks so much for taking the time to post some improved code. I'm all for efficiency--so I'll use yours instead of the other version.
Appreciate the help!
Best,
Jessica
Thanks so much for taking the time to post some improved code. I'm all for efficiency--so I'll use yours instead of the other version.
Appreciate the help!
Best,
Jessica
Welcome Jessica ! 
You're on the right track, assuming of course your site is hosted on a Unix/Linux server that has mod_rewrite enabled. As usual, there are many ways to accomplish what you're looking to do. And the way you laid it out with the change to the rewriterule is certainly one way to do it. I'd personally do it a little differently, but it's a personal choice that simply makes more sense to me.
I'd use
The only real difference between my version and your original one is that I removed the ! or not regex from the beginning of the rewrite condition url address.
The reason I'd do it this way is because the rewrite condition would only test true and fire when the domain being requested was yourmaindiste.com, whereas in your tweaked example it would have to fire for every domain except the main domain. So in theory your rule would probably have to fire many times more often than mine would. Increasing server load.
In most cases it wouldn't make a huge difference, but I see no reason to add extra load to the server if one can avoid it.
You're on the right track, assuming of course your site is hosted on a Unix/Linux server that has mod_rewrite enabled. As usual, there are many ways to accomplish what you're looking to do. And the way you laid it out with the change to the rewriterule is certainly one way to do it. I'd personally do it a little differently, but it's a personal choice that simply makes more sense to me.
I'd use
CODE
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www\.)?yourmainsite\.com [NC]
RewriteRule ^robots.txt$ /robots_blocked.txt [NC,L]
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www\.)?yourmainsite\.com [NC]
RewriteRule ^robots.txt$ /robots_blocked.txt [NC,L]
The only real difference between my version and your original one is that I removed the ! or not regex from the beginning of the rewrite condition url address.
The reason I'd do it this way is because the rewrite condition would only test true and fire when the domain being requested was yourmaindiste.com, whereas in your tweaked example it would have to fire for every domain except the main domain. So in theory your rule would probably have to fire many times more often than mine would. Increasing server load.
In most cases it wouldn't make a huge difference, but I see no reason to add extra load to the server if one can avoid it.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








