Hello Everyone,
I researched this and could not find any posts that would answer my question. On our company website we have a link to the salesmen page. If you click this link you must put in a user name and password to access the pages. The content on these pages is for our salesmen only and we do not want the general public or our competition getting a look at them. The problem is that when a search engine spiders my site will the robots be able to access the password protected files? If so what are your suggestions to prevent this from happening. I am familiar with robots.txt files because I have another site that we do not allow the search engines to spider at all but I am unfamiliar with how to stop them from accessing certain pages.
P.S. There are a ton of files in this directory so if I have to use a robots.txt file can I just put it on the index file of the directory or do I have to put it on each individual page?
Sincerely and with Great Respect for your expertise,
Melinda
Are you a Google Analytics enthusiast?
Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE!

www.CustomReportSharing.com
From the folks who brought you High Rankings!
More SEO Content
International SEM | Social Media | Search Friendly Design | SEO | Paid Search / PPC | Seminars | Forum Threads | Q&A | Copywriting | Keyword Research | Web Analytics / Conversions | Blogging | Dynamic Sites | Linking | SEO Services | Site Architecture | Search Engine Spam | Wrap-ups | Business Issues | HRA Questions | Online Courses
Non Public Pages And Disallows
Started by
melcat
, Feb 22 2007 10:14 AM
3 replies to this topic
#1
Posted 22 February 2007 - 10:14 AM
#2
Posted 22 February 2007 - 11:29 AM
If the only way to get to those pages is by correctly filling out and submitting a form, then search engines won't go to the pages. Even if the password was filled in for them, spiders don't click "submit" buttons. They find links and follow them, and it doesn't sound like you're providing any links.
Of course, if someone else were to put up a link to the protected pages, then simply protecting them with a login form isn't going to accomplish anything, so you should specifically tell the search engines via your robots.txt file that you don't want any of those pages indexed. And you don't have to specify each document in the robots.txt. If they're all in the same directory, just disallow the directory itself.
Of course, if someone else were to put up a link to the protected pages, then simply protecting them with a login form isn't going to accomplish anything, so you should specifically tell the search engines via your robots.txt file that you don't want any of those pages indexed. And you don't have to specify each document in the robots.txt. If they're all in the same directory, just disallow the directory itself.
#3
Posted 23 February 2007 - 09:41 AM
I've been running a test since December 06 on one of my sites. I've created a password protected folder on and linked to the page with username:password@www.mysite.com, which basically tells all the world and everyone how to access the pw protected folder. No robots/spiders have successfully accessed the page so far. I assume if robots/spiders don't access a protected folder they have the username/pw combination for they are certainly* not going to access a page where they have to guess them 
Edit: Just thought of checking the error log, there has been one robot that unsuccessfully tried to access the pw protected page and got a Unauthorized message.
Please note that the password protection was done on OS level and not as Randy mentioned a form quering for a username/pw that redirects to a file.
*Excluding bad bots used by crackers
Edit: Just thought of checking the error log, there has been one robot that unsuccessfully tried to access the pw protected page and got a Unauthorized message.
Please note that the password protection was done on OS level and not as Randy mentioned a form quering for a username/pw that redirects to a file.
*Excluding bad bots used by crackers
#4
Posted 11 May 2007 - 02:43 AM
Hello,
I think we need to be clear on the difference between a robot-excluded page, and a password-protected page. They are not the same.
If a page or directory is disallowed in robots.txt, it means that the crawler will avoid it. But since anyone can read the robots.txt file, that means I can still visit that directory thru my browser. in short, robots.txt instruct crawlers only, not people.
But the password-protected page uses a different mechanism, usually via the .htaccess file. That stops bother crawlers and people from accessing it without the valid password.
Alex
I think we need to be clear on the difference between a robot-excluded page, and a password-protected page. They are not the same.
If a page or directory is disallowed in robots.txt, it means that the crawler will avoid it. But since anyone can read the robots.txt file, that means I can still visit that directory thru my browser. in short, robots.txt instruct crawlers only, not people.
But the password-protected page uses a different mechanism, usually via the .htaccess file. That stops bother crawlers and people from accessing it without the valid password.
Alex
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users








