Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo

Anybody Else Ever See Google Index Their Robots.txt File?


  • Please log in to reply
24 replies to this topic

#16 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,317 posts

Posted 04 November 2008 - 04:30 PM

QUOTE
Surely anyone could access the robots.txt file (of any site) and get a complete road-map into the stuff you asked the botīs NOT to show!


That's why you should never actually have it show any link that someone could get to directly, that you wouldn't want a person to see.

I use directories for that purpose. Just put your secret file in the blocked directory, and nobody can get to it, assuming your server is set up correctly so that it doesn't allow people to browse directories.

#17 Alan Perkins

Alan Perkins

    Token male admin

  • Admin
  • 1,559 posts
  • Location:UK

Posted 04 November 2008 - 05:11 PM

QUOTE(Jill)
Just put your secret file in the blocked directory, and nobody can get to it
That's not actually true. "Security by obscurity" is quite a low form of security. Once the URL is discovered, the security is broken. It's fine if you just want to put up minimum barriers and keep things simple without worrying about passwords, secure servers etc. smile.gif

#18 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,317 posts

Posted 04 November 2008 - 11:10 PM

Right, but they can't find stuff just by using your robots.txt file, correct? Or am I wrong in thinking that?

#19 Alan Perkins

Alan Perkins

    Token male admin

  • Admin
  • 1,559 posts
  • Location:UK

Posted 05 November 2008 - 04:36 AM

QUOTE
Right, but they can't find stuff just by using your robots.txt file, correct?


Correct, not if you use obscure file names and hide directory listings.

#20 1dmf

1dmf

    Keep Asking, Keep Questioning, Keep Learning

  • Active Members
  • PipPipPipPipPipPipPip
  • 2,154 posts
  • Location:Worthing - England

Posted 05 November 2008 - 04:51 AM

QUOTE
Surely anyone could access the robots.txt file (of any site) and get a complete road-map into the stuff you asked the botīs NOT to show!

Sounds like hiding the front door key under a flower pot and then sticking a note on the front door with instructions where to find it


love ya thinking hysterical.gif

#21 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,317 posts

Posted 05 November 2008 - 12:28 PM

QUOTE(Alan Perkins @ Nov 5 2008, 04:36 AM) View Post
Correct, not if you use obscure file names and hide directory listings.


Good because that's what I meant in my previous post!

#22 Affan Laghari

Affan Laghari

    HR 2

  • Members
  • PipPip
  • 17 posts

Posted 05 November 2008 - 05:22 PM

Seems I'm a bit late but just read the [url=http://www.highrankings.com/newsletter]High Rankings Advisor[/url] Feeds.

Anyway, talking about indexing, I was quite amazed to see Brett Tabke run a blog in webmasterworld's robots.txt (www.webmasterworld.com/robots.txt). Google would surely miss unless it starts indexing these. The blog seemed to have regular posting around an year ago though it's dead now.

#23 JohnMu

JohnMu

    HR 1

  • Members
  • Pip
  • 2 posts

Posted 06 November 2008 - 10:41 AM

Hi everyone
This is always an interesting topic so I thought I'd jump on over and leave my 2 cents smile.gif

- The quickest way to get it out of the index is to use the "noindex" x-robots-tag (as Alan mentioned a bit back). These HTTP headers are understood by all of the major search engines (as far as I know).

- Another way to get it removed is to block it with the robots.txt using a "disallow" directive. In a situation like that, we'll still read your robots.txt (because we have to check it anyway), but it'll generally take a bit of time for it to drop out of the index (because "disallow" blocks crawling, but not indexing). A way to speed that up is to use the urgent URL removal tool within your Google Webmaster Tools account.

- That said, it's not like your robots.txt file is going to rank for anything really interesting, so I wouldn't worry about it being indexed. smile.gif

For what it's worth, the WMW robots.txt is using cloaking. You can see that by looking at the cached file:
http://www.webmaster....com/robots.txt
vs
http://www.google.co....com/robots.txt

I would certainly NOT recommend doing that. Not only is it against our Webmaster Guidelines, but if you have the slightest hickup in the cloaking code you'll suddenly be serving the wrong robots.txt file. Don't play with your robots.txt file smile.gif

John

#24 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 06 November 2008 - 01:50 PM

Welcome back John ! hi.gif

Is this your way of reminding me that I haven't gotten by your blog in a month or so? giggle.gif

#25 Robert W. Franson

Robert W. Franson

    HR 1

  • Members
  • Pip
  • 1 posts

Posted 09 November 2008 - 05:41 AM

That WebMasterWorld blog certainly is an example of applying creative thinking to a corner of the Web that we ordinarily don't think of as anything but minimally utilitarian! ... No doubt for good reason.

As for security: It's also an example of one of the oldest tricks in the cryptology books for hiding secrets, including "secret writing": hide it in plain sight. Of course that's only a make-do substitute for real hiding or real encryption or both.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users