Are you a Google Analytics enthusiast?
More SEO Content
Non Public Pages And Disallows
Posted 22 February 2007 - 10:14 AM
I researched this and could not find any posts that would answer my question. On our company website we have a link to the salesmen page. If you click this link you must put in a user name and password to access the pages. The content on these pages is for our salesmen only and we do not want the general public or our competition getting a look at them. The problem is that when a search engine spiders my site will the robots be able to access the password protected files? If so what are your suggestions to prevent this from happening. I am familiar with robots.txt files because I have another site that we do not allow the search engines to spider at all but I am unfamiliar with how to stop them from accessing certain pages.
P.S. There are a ton of files in this directory so if I have to use a robots.txt file can I just put it on the index file of the directory or do I have to put it on each individual page?
Sincerely and with Great Respect for your expertise,
Posted 22 February 2007 - 11:29 AM
Of course, if someone else were to put up a link to the protected pages, then simply protecting them with a login form isn't going to accomplish anything, so you should specifically tell the search engines via your robots.txt file that you don't want any of those pages indexed. And you don't have to specify each document in the robots.txt. If they're all in the same directory, just disallow the directory itself.
Posted 23 February 2007 - 09:41 AM
Edit: Just thought of checking the error log, there has been one robot that unsuccessfully tried to access the pw protected page and got a Unauthorized message.
Please note that the password protection was done on OS level and not as Randy mentioned a form quering for a username/pw that redirects to a file.
*Excluding bad bots used by crackers
Posted 11 May 2007 - 02:43 AM
I think we need to be clear on the difference between a robot-excluded page, and a password-protected page. They are not the same.
If a page or directory is disallowed in robots.txt, it means that the crawler will avoid it. But since anyone can read the robots.txt file, that means I can still visit that directory thru my browser. in short, robots.txt instruct crawlers only, not people.
But the password-protected page uses a different mechanism, usually via the .htaccess file. That stops bother crawlers and people from accessing it without the valid password.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users