Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
Multiple Web Farm Subdomains
#1
Posted 17 May 2007 - 06:14 AM
We have a fairly large website that we have split into two different physical locations for redundancy. Both are accessible through a www subdomain. We spread the load across the physical locations through a DNS round-robin system.
However, for certain areas of the site we need to make sure users will go back to the same farm so we redirect them to either ww0 or ww3.example.co.uk.
This has resulted in pages being indexed on www, ww0 and ww3 and we're being told that Google may not be giving us the same page ranking because the content is spread across domains and due to duplication (although reading other posts here I understand duplication shouldn't be a problem).
Any idea if this is likely to be a real issue we should look at?
Many thanks!
Carlos Baez
#2
Posted 17 May 2007 - 09:20 PM
Are you on a Unix/Linux system or IIS?
If on a *nix and if it has mod_rewrite available to you I would simply use a bit of .htaccess magic to produce a different robots.txt file for the ww0 and ww3 requests. It won't affect visitors since it's just the robots.txt, but will tell the spiders to leave those two subs alone. I've used it before with similar round robin issues and it seems to work quite well.
On the off chance you have a *nix system and mod_rewrite is available to you, the .htacces would look something like:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(ww0|ww3)\.yourdomain\co\.uk [NC]
RewriteRule ^robots.txt$ deny_robots.txt
Leave your normal robots.txt file exactly like it is.
Create a new file called deny_robots.txt and upload it to the root level of the site(s) with the following:
Disallow: /
What the above does in something resembling English:
Since the engines treat each sub as a separate site, they'll query a robots.txt file for each of the three subs www.yourdomain.co.uk and get the good file that lets them crawl everything. However when they send a query for ww0.yourdomain.co.uk and ww3.yourdomain.co.uk they'll get data from the deny_robots.txt file instead, including the message to leave all pages on those subs alone.
If you need to account for more subs or more subs being added down the line, simply change the rewrite condition to a negative match so that it only fires when the query is not for the www or non-www version. eg
RewriteCond %{HTTP_HOST} !^(www\.)?yourdomain\.co\uk [NC]
Make sense?
#3
Posted 21 May 2007 - 05:46 AM
Are you on a Unix/Linux system or IIS?
[...]
Make sense?
Thanks Randy! We're actually on IIS rather than unix/linux. I'll try to find out whether there is something like that we can use on IIS but doesn't sound familiar, at least not as simple...
But will leaving it as it is actually cause us any harm at all?
#4
Posted 21 May 2007 - 10:52 AM
"I'll be back"
#5
Posted 23 May 2007 - 06:31 AM
' read in a file and stream it out to the browser
Dim objFSO, objTextFile
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile(Server.MapPath(TextFileName))
Do While Not objTextFile.AtEndOfStream
Response.Write objTextFile.ReadLine & vbCrLf
Loop
objTextFile.Close
Set objTextFile = Nothing
Set objFSO = Nothing
End Sub
dim qstring
qstring = request.servervariables("QUERY_STRING")
if instr(qstring,"robots.txt") > 0 then
with response
if instr(qstring,"www") = 0 then
.write "User-agent: *"
.write vbCrLf
.write "Disallow: /"
.write vbCrLf
else
StreamText("main-robots.txt")
end if
end with
response.end
end if
Rename your robots.txt to "main-robots.txt"
The code will check for "www" existing in the requested robots.txt URL that is passed to the 404 page, then stream the all disallowed text if it is not there, or the text from main-robots.txt if it is.
#6
Posted 23 May 2007 - 08:02 AM
The code will check for "www" existing in the requested robots.txt URL that is passed to the 404 page, then stream the all disallowed text if it is not there, or the text from main-robots.txt if it is.
Many thanks for the code!!
However, I am still not clear on whether it is actually necessary.
#7
Posted 23 May 2007 - 09:51 AM
The only other option is to make your load balancing completely transparent to both users and spiders. There are certainly ways to do this, however it is a bit more expensive generally speaking.
#8
Posted 23 May 2007 - 10:14 AM
I think the theory was that if it is split into multiple domains then supposedly the page ranking would be lower as google would see less content in each subdomain and therefor consider it "less important".
Thanks!
#9
Posted 23 May 2007 - 11:28 PM
That's not how it works.
If each page was still linked to each other page that they would normally have been linked to, then it doesn't matter whether they're on one domain or spread across multiple subdomains or anything else.
#10
Posted 24 May 2007 - 11:10 AM
If each page was still linked to each other page that they would normally have been linked to, then it doesn't matter whether they're on one domain or spread across multiple subdomains or anything else.
Thanks Jill. All our links are relative, so once a crawler gets into one of the subdomains the links will stay within that domain, which means they could get to the same page with up to three different subdomains depending on the starting route.
#11
Posted 27 May 2007 - 09:50 AM
No, you definitely don't want to do it that way. Any content from any of your sites should always only be reachable via one URL.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








