Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



SEO Class in Chicago, IL

Learn How To Optimize Your Website on July 26, 2013


Looking for personalized in-depth SEO training among your peers?



High Rankings is offering a 1-day customized SEO training class in Chicago. Class size is limited so please sign-up now if you want in!



 


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

What Wil Ses Do With Similar Robots.txt Entries?


  • Please log in to reply
11 replies to this topic

#1 skr325

skr325

    HR 2

  • Active Members
  • PipPip
  • 24 posts
  • Location:Chicago

Posted 24 April 2008 - 10:29 AM

Working on a robots.txt file and have noticed that many of the entries are listed multiple times in various cases as in the following example:

Disallow: /exampleone/
Disallow: /ExampleOne/
Disallow: /exampleOne/

It looks like Google is indexing these pages (with content not just URLs) regardless of these exclusions. So my own "duh-rometer" tells me something is amiss.
  1. Can someone please explain what impact these 3 identical listings would have on a the bot?
  2. Should I have only one entry, and if so, which one?
  3. Are files and directories in the robots.txt case sensitive?

Thanks,
Steve

#2 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,376 posts

Posted 24 April 2008 - 11:44 AM

Can you post the entire robots.txt here? How long has it been up? Something is definitely amiss if Google is indexing that directory.

#3 skr325

skr325

    HR 2

  • Active Members
  • PipPip
  • 24 posts
  • Location:Chicago

Posted 24 April 2008 - 01:26 PM

Thanks Jill-

here is the full text. It's been up at least 6 months.

User-agent: *
Disallow: /AccessAccount/
Disallow: /accessaccount/
Disallow: /accessaccountsearch/
Disallow: /Accessaccountsearch/
Disallow: /AccessAccountsearch/
Disallow: /AccessAccountSearch/
Disallow: /AccessaccountSearch/
Disallow: /accessAccountSearch/
Disallow: /accessAccountSearch/
Disallow: /accessaccountSearch/
Disallow: /Advantage/
Disallow: /advantage/
Disallow: /bin/
Disallow: /classes/
Disallow: /Demo/
Disallow: /demo/
Disallow: /Downloads/
Disallow: /downloads/
Disallow: /eBinder/
Disallow: /ebinder/
Disallow: /Binder/
Disallow: /binder/
Disallow: /BinderView/
Disallow: /binderview/
Disallow: /binderView/
Disallow: /eBV/
Disallow: /ebv/
Disallow: /FaxBack/
Disallow: /faxBack/
Disallow: /faxback/
Disallow: /Faxback/
Disallow: /Home/
Disallow: /home/
Disallow: /Sign-In/
Disallow: /Sign-in/
Disallow: /sign-In/
Disallow: /sign-in/
Disallow: /SignIn/
Disallow: /Signin/
Disallow: /signIn/
Disallow: /signin/
Disallow: /UnauthorizedAccess/
Disallow: /unauthorizedAccess/
Disallow: /Unauthorizedaccess/
Disallow: /unauthorizedaccess/
Disallow: /Utilities/
Disallow: /utilities/
Disallow: /Xsl/
Disallow: /XSL/
Disallow: /xsl/
Disallow: /SignUp/
Disallow: /signUp/
Disallow: /Signup/
Disallow: /signup/
Disallow: /CustomerSupport/YourAccount.aspx
Disallow: /customersupport/YourAccount.aspx
Disallow: /customersupport/YourAccount.aspx
Sitemap: http://www.example.com/sitemap.xml.gz

#4 TriExpert

TriExpert

    HR 2

  • Active Members
  • PipPip
  • 24 posts
  • Location:Vermont, USA

Posted 24 April 2008 - 01:37 PM

It's important to have a blank line between the Disallow directives and the Sitemap directives. But that's advice out of a book; I've never tested to see exactly what FUBARs if you fail to do so.

Edited by TriExpert, 24 April 2008 - 02:13 PM.


#5 skr325

skr325

    HR 2

  • Active Members
  • PipPip
  • 24 posts
  • Location:Chicago

Posted 01 May 2008 - 01:51 AM

Still looking for some love love.gif here. Any thoughts from Jill or others?

-Steve

#6 projectphp

projectphp

    Lost in Translation

  • Moderator
  • 2,203 posts
  • Location:Sydney Australia

Posted 01 May 2008 - 03:12 AM

QUOTE
1. Can someone please explain what impact these 3 identical listings would have on a the bot?

They'l block different things.

QUOTE
2. Should I have only one entry, and if so, which one?

No, all of them as appropriate, i.e. if you link to various CaSings, you need all of 'em.

QUOTE
3. Are files and directories in the robots.txt case sensitive?

Yes, because the WEB IS.

You might wanna use a robots noindex metatag, JIC, for all these pages.

#7 skr325

skr325

    HR 2

  • Active Members
  • PipPip
  • 24 posts
  • Location:Chicago

Posted 14 May 2008 - 09:53 AM

I've started back through the pages to block at that level, but my original question still stands - why are these pages still in the index if the robots.txt is correct?

-steve



#8 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 14 May 2008 - 10:59 AM

Because the search engines apparently already knew those pages existed for one.

robots.txt doesn't say to the search engines Never include these pages in your index. Instead it says Do not spider/crawl these pages, even if you've spidered them before, don't spider them anymore.

This points to one of those really horrid things about using an IIS server and not being careful in how you use Case in your URLs. On a Unix/Linux box each of those addresses would have been seen as being unique, so if the path was really all lowercase on the server every other attempted case would be returned as a 404 Not Found.

When you're on an IIS server you need to be extremely careful with how you link to your files. Choose one way and do it that way every time.

#9 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,376 posts

Posted 14 May 2008 - 11:47 AM

Are they still indexed with content and not just the URLs?

#10 incrediblehelp

incrediblehelp

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 591 posts
  • Location:Kentucky

Posted 14 May 2008 - 03:33 PM

Are you internally linking to different version on your own website or other websites you control?

#11 skr325

skr325

    HR 2

  • Active Members
  • PipPip
  • 24 posts
  • Location:Chicago

Posted 22 May 2008 - 01:38 PM

QUOTE(Jill @ May 14 2008, 11:47 AM) View Post
Are they still indexed with content and not just the URLs?


Yup!

QUOTE(incrediblehelp @ May 14 2008, 03:33 PM) View Post
Are you internally linking to different version on your own website or other websites you control?


I am not sure what you mean? Am I linking to these pages?


#12 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 22 May 2008 - 06:30 PM

QUOTE
I am not sure what you mean? Am I linking to these pages?


I would imagine Jaan is asking if you've checked all of your own links to make sure you're being 100% consistent with the Case being used. Under they fact that the search engine spiders had to find links that contained the different case in the first place to even visit the different variations of the page urls.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users