Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo

Which Comes First?


  • Please log in to reply
7 replies to this topic

#1 franco81

franco81

    HR 4

  • Active Members
  • PipPipPipPip
  • 207 posts
  • Location:London

Posted 07 August 2006 - 08:43 PM

Hi,

If we set a robots file to disallow indexing of everything and also set up mod_rewrite to redirect everything, which one is the robot going to do.

not index the content?
or
redirect to the other page?

#2 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 07 August 2006 - 09:08 PM

In theory, those spiders who obey robots.txt exclusions should never see the redirect, because they're supposed to check the robots.txt before spidering the site.

But I've seen stranger things happen, especially if the spiders already know the pages exist. So I wouldn't necessarily count on them never seeing the redirect.

#3 franco81

franco81

    HR 4

  • Active Members
  • PipPipPipPip
  • 207 posts
  • Location:London

Posted 07 August 2006 - 09:37 PM

thanks mate. what is the best reference to find out about search engine spiders, which ones follow robots.txt, which recognise NOODP tag etc.?

I found this page:
CODE
http://www.tamingthebeast.net/articles2/search-engine-spiders.htm


#4 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 07 August 2006 - 10:03 PM

All of the good spiders obey robots.txt. So any search engine spider is going to fall into this category. I believe all of them also have sections regarding robot exclusions in their webmaster guidelines that you may find helpful.

As far as the NOODP Meta tag, we have a thread about that around here somewhere. It's probably in Industry News. If memory serves, I believe MSN introduced it and Google jumped on board. I don't recall reading that Yahoo is also honoring it as yet.

<edit to add>

Here's the thread I was thinking about. Still no mention of Yahoo! implementing such a tag, but they would be sort of shooting themselves in the foot if they did since their sometimes wacky descriptions come from their own directory.

Edited by Randy, 07 August 2006 - 10:11 PM.


#5 franco81

franco81

    HR 4

  • Active Members
  • PipPipPipPip
  • 207 posts
  • Location:London

Posted 07 August 2006 - 10:48 PM

cheers mate, fountain of knowledge as usual.

#6 franco81

franco81

    HR 4

  • Active Members
  • PipPipPipPip
  • 207 posts
  • Location:London

Posted 08 August 2006 - 09:43 PM

actually, i have another question relating to this post, if we have a situation where the illusion of files is created using mod_rewrite, is it possible to add entries to a robots.txt file for these virtual folders/files?

e.g:

somesite.com/folder/

folder does not exist, in fact the above is being redirected to something like...

content-management-system.com/somesite.com/...?folder=folder


So am I able to put an entry in to stop robots from indexing /folder/ whne in fact /folder/ does not physically exist?

#7 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,317 posts

Posted 09 August 2006 - 01:45 AM

QUOTE
So am I able to put an entry in to stop robots from indexing /folder/ whne in fact /folder/ does not physically exist?


Sure, but that won't stop them from indexing the ultimate URL destination.

#8 franco81

franco81

    HR 4

  • Active Members
  • PipPipPipPip
  • 207 posts
  • Location:London

Posted 10 August 2006 - 06:09 PM

okay, thanks. How about if I made only one entry to stop robots from indexing...

content-management-system.com/somesite.com/

All the bots will see is the URL...

somesite.com/folder/

So I would guess they would still index the page. In which case I would actually have to block both URLs in the robots.txt to make sure?

Also, can you put in entries with GET variables? Like...

content-management-system.com/somesite.com/index.php?folder=folder




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users