Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Multiple Web Farm Subdomains


  • Please log in to reply
10 replies to this topic

#1 Carlos Baez

Carlos Baez

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 17 May 2007 - 06:14 AM

Hi,

We have a fairly large website that we have split into two different physical locations for redundancy. Both are accessible through a www subdomain. We spread the load across the physical locations through a DNS round-robin system.

However, for certain areas of the site we need to make sure users will go back to the same farm so we redirect them to either ww0 or ww3.example.co.uk.

This has resulted in pages being indexed on www, ww0 and ww3 and we're being told that Google may not be giving us the same page ranking because the content is spread across domains and due to duplication (although reading other posts here I understand duplication shouldn't be a problem).

Any idea if this is likely to be a real issue we should look at?

Many thanks!

Carlos Baez


#2 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 17 May 2007 - 09:20 PM

Welcome Carlos ! hi.gif

Are you on a Unix/Linux system or IIS?

If on a *nix and if it has mod_rewrite available to you I would simply use a bit of .htaccess magic to produce a different robots.txt file for the ww0 and ww3 requests. It won't affect visitors since it's just the robots.txt, but will tell the spiders to leave those two subs alone. I've used it before with similar round robin issues and it seems to work quite well.

On the off chance you have a *nix system and mod_rewrite is available to you, the .htacces would look something like:
CODE
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(ww0|ww3)\.yourdomain\co\.uk [NC]
RewriteRule ^robots.txt$ deny_robots.txt


Leave your normal robots.txt file exactly like it is.

Create a new file called deny_robots.txt and upload it to the root level of the site(s) with the following:
CODE
User-agent: *
Disallow: /


What the above does in something resembling English:

Since the engines treat each sub as a separate site, they'll query a robots.txt file for each of the three subs www.yourdomain.co.uk and get the good file that lets them crawl everything. However when they send a query for ww0.yourdomain.co.uk and ww3.yourdomain.co.uk they'll get data from the deny_robots.txt file instead, including the message to leave all pages on those subs alone.

If you need to account for more subs or more subs being added down the line, simply change the rewrite condition to a negative match so that it only fires when the query is not for the www or non-www version. eg
RewriteCond %{HTTP_HOST} !^(www\.)?yourdomain\.co\uk [NC]

Make sense?

#3 Carlos Baez

Carlos Baez

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 21 May 2007 - 05:46 AM

QUOTE(Randy @ May 18 2007, 03:20 AM) View Post
Welcome Carlos ! hi.gif

Are you on a Unix/Linux system or IIS?
[...]
Make sense?


Thanks Randy! We're actually on IIS rather than unix/linux. I'll try to find out whether there is something like that we can use on IIS but doesn't sound familiar, at least not as simple...

But will leaving it as it is actually cause us any harm at all?

#4 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 5,882 posts
  • Location:Blackpool UK

Posted 21 May 2007 - 10:52 AM

You can do a bit of scripting on a 404 page to do a "dynamic robots.txt" quite easily.

"I'll be back" biggrin.gif

#5 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 5,882 posts
  • Location:Blackpool UK

Posted 23 May 2007 - 06:31 AM

code for a "dynamic" robots.txt. For use in a custom scripted 404 page

CODE
Sub StreamText(TextFileName)
' read in a file and stream it out to the browser
Dim objFSO, objTextFile
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile(Server.MapPath(TextFileName))

Do While Not objTextFile.AtEndOfStream
    Response.Write objTextFile.ReadLine  & vbCrLf
Loop
objTextFile.Close
Set objTextFile = Nothing
Set objFSO = Nothing
End Sub

dim  qstring
qstring = request.servervariables("QUERY_STRING")

if instr(qstring,"robots.txt") > 0 then
    with response
        if instr(qstring,"www") = 0 then
            .write "User-agent: *"
            .write vbCrLf
            .write "Disallow: /"
            .write vbCrLf
        else
            StreamText("main-robots.txt")    
        end if
    end with
response.end
end if


Rename your robots.txt to "main-robots.txt"

The code will check for "www" existing in the requested robots.txt URL that is passed to the 404 page, then stream the all disallowed text if it is not there, or the text from main-robots.txt if it is.

#6 Carlos Baez

Carlos Baez

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 23 May 2007 - 08:02 AM

QUOTE(chrishirst @ May 23 2007, 12:31 PM) View Post
code for a "dynamic" robots.txt. For use in a custom scripted 404 page

The code will check for "www" existing in the requested robots.txt URL that is passed to the 404 page, then stream the all disallowed text if it is not there, or the text from main-robots.txt if it is.


Many thanks for the code!!

However, I am still not clear on whether it is actually necessary. unsure.gif

#7 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 23 May 2007 - 09:51 AM

It's necessary if you don't want to end up having all three "sites" (www, ww0 and ww3) indexed by the spiders. They're going to see all three as completely separate sites at completely separate addresses otherwise.

The only other option is to make your load balancing completely transparent to both users and spiders. There are certainly ways to do this, however it is a bit more expensive generally speaking.

#8 Carlos Baez

Carlos Baez

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 23 May 2007 - 10:14 AM

Well, it's not really a problem if it appears in google with the different subdomains, as long as it doesn't affect the how up/down in searches it comes up.

I think the theory was that if it is split into multiple domains then supposedly the page ranking would be lower as google would see less content in each subdomain and therefor consider it "less important".

Thanks!

#9 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,324 posts

Posted 23 May 2007 - 11:28 PM

QUOTE
I think the theory was that if it is split into multiple domains then supposedly the page ranking would be lower as google would see less content in each subdomain and therefor consider it "less important".


That's not how it works.

If each page was still linked to each other page that they would normally have been linked to, then it doesn't matter whether they're on one domain or spread across multiple subdomains or anything else.




#10 Carlos Baez

Carlos Baez

    HR 1

  • Members
  • Pip
  • 5 posts

Posted 24 May 2007 - 11:10 AM

QUOTE(Jill @ May 24 2007, 05:28 AM) View Post
That's not how it works.

If each page was still linked to each other page that they would normally have been linked to, then it doesn't matter whether they're on one domain or spread across multiple subdomains or anything else.


Thanks Jill. All our links are relative, so once a crawler gets into one of the subdomains the links will stay within that domain, which means they could get to the same page with up to three different subdomains depending on the starting route.

#11 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,324 posts

Posted 27 May 2007 - 09:50 AM

QUOTE
which means they could get to the same page with up to three different subdomains depending on the starting route.


No, you definitely don't want to do it that way. Any content from any of your sites should always only be reachable via one URL.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users