Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo

Is Robots.txt Necessary For Unlinked File


  • Please log in to reply
9 replies to this topic

#1 bobmeetin

bobmeetin

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 418 posts
  • Location:Colorado

Posted 08 July 2008 - 05:06 PM

Say I have a file which is not generally confidential, but just the same, I don't want it indexed by search engines. I place it in the home folder of a site but has no links to it. Will it be found and indexed or do I need a robots.txt file to ensure its anonymity? With no links I would not expect it to be indexed, but better be sure... -Bob

# robots.txt for www.mysite.com

User-agent: *
Disallow: /private-foo.html

#2 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,311 posts

Posted 08 July 2008 - 07:13 PM

Yes, you should exclude it via robots.txt. The engines are finding and indexing all sorts of things you don't want them to find and index these days.

Just remember that any person can look at your robots.txt file and then go view the page in question, however. So be sure there's nothing there you don't actually want people to see.

#3 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 08 July 2008 - 07:52 PM

Password protect the file if it's something you don't want others to see. You can even do it with a simple scripted solution right in the file if you'd like to.

robots.txt would keep it out of all of the nice bots. But having the file referenced in your robots.txt is going to ensure it gets picked up by the bad bots or any other scurrilous characters out there who mine robots.txt to find stuff people don't want others to see.

#4 bobmeetin

bobmeetin

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 418 posts
  • Location:Colorado

Posted 09 July 2008 - 08:23 AM

Hmmm I guess that confirms that robots.txt is not an honored standard. Harumph. Password protection is easy, just prefer to avoid it if avoidable.

So let me rephrase this (you know both good and bad attorneys rephrase questions to make the answer fit...) as:

In root of the website is a standard index.php with a navigation system which does not include either my_private_file.html or perhaps some_private_folder or perhaps even .invisible.html.

Either with or without robots.txt do bad bots and/or unscrupulous programs have the ability to find these private-ish files or folders even though they are not identified anywhere in navigation and there is an index file in place?

#5 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 09 July 2008 - 09:23 AM

They'll sometimes look for certain file names bobmeetin. Even if there are no links pointing at the files. You can see lots of these rogue bots out there if you review your error_log files for a few days or weeks in a row.

I wouldn't name it anything obvious. eg hidden.html, invisible.html, setup.php, admin.php.

How about another obfuscation idea. It's not perfect and not secure but you might like it if security isn't a huge issue.

Set up your file and name it something a little wonky. Maybe even put a number in it so the rogue bots wouldn't stumble across it easily.

Then in the HTML code itself include a meta robots tag with a value of noindex, nofollow, noarchive. In theory if the bad bots don't find your page because of the odd filename that's good. And if a good bot does happen to stumble across it the meta robots tag will tell them to stay away. Just like robots.txt would, cept you're not putting it out there for the bad bots to find because there is no robots.txt entry.

#6 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,311 posts

Posted 09 July 2008 - 04:43 PM

What I do for those types of files is put it in a directory, then exclude the directory.

People can't browse the directory so they can't find the file, and the directory and everything in it gets excluded via robots.txt.

#7 maleman

maleman

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 677 posts

Posted 09 July 2008 - 07:29 PM

QUOTE
What I do for those types of files is put it in a directory, then exclude the directory.

Me too. 5 months ago, I put 4 PDF files in a directory and excluded this dir via robots.txt. A couple weeks ago I did a "site" search for .pdf files and the 4 PDF files came up in the return.

So, is this a hint that excluding a dir with robots.txt doesn't mean much nowadays?

#8 zephyr

zephyr

    HR 3

  • Active Members
  • PipPipPip
  • 56 posts
  • Location:Connecticut

Posted 09 July 2008 - 08:14 PM

"Rather than use a robots.txt file to block crawler access to pages, you can add a <META> tag to an HTML page to tell robots not to index the page."

Big G does honor this tag, I believe.

#9 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,311 posts

Posted 09 July 2008 - 10:57 PM

QUOTE
A couple weeks ago I did a "site" search for .pdf files and the 4 PDF files came up in the return.


Really? I'd have to see that with my own eyes to believe it. If you're talking about Google, they're generally good at obeying robots.txt but I haven't checked all that closely lately.

#10 bobmeetin

bobmeetin

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 418 posts
  • Location:Colorado

Posted 09 July 2008 - 11:07 PM

Just to be sure, by site search I/we are assuming that you're referring to a google search and not an internal website search? Gosh how fun is sparring!




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users