Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo

Will Not Having A Robots.txt Effect My Site..


  • Please log in to reply
7 replies to this topic

#1 DJKay

DJKay

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 355 posts

Posted 27 July 2009 - 04:58 PM

Hi,

Google's Web Master Guidelines recommend having a robots.txt file. Everything I have always been taught over the last 10 years is that it benefits your site to have a robots.txt, it helps google crawl your site, tells them what pages not to crawl etc.

My CEO/CTO came to me today to ask about it. Apparently, there is a credit card exchange standard called PCI. We had a scan of our technology and it was deamed a low priority threat that we use a Robots.txt file because to them its like a site map that someone can go in and use a packet sniffer or other techniques to exploit our security protocols.

He asked me about the possibility of using in page tags, but I know that bots get around those and there can be some issues with those if they are not executed properly. (for that matter, you can have problems if your robots.txt is not executed properly either)

I am concerned that it will impact the ranking of our site. Its pretty well optimized and its our major lead generation engine for the company. I don't have a great feeling about removing the robots.txt file. He told me its not really a requirement, but he wanted me to get more information.

Any help or info appreciated. DJKay

#2 BBCoach

BBCoach

    HR 5

  • Moderator
  • 402 posts

Posted 27 July 2009 - 05:08 PM

It has no effect on your site's rankings one way or the other unless you block your site from being crawled.

If you disallow certain directories in the robots file, then yes anyone can view that file and if they so choose to they can attempt to view those directories/files (or hack at them if there's some type of security). For PCI compliance I never have sensitive data available on the web server or accessible from the web server. Bad practice if you do.

#3 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 27 July 2009 - 07:00 PM

What BBCoach said.

And I'll add on that you really shouldn't be placing anything in your robots.txt that you actually want to keep bots and people from seeing. Those need to be password protected. And that no sensitive customer information should be housed anywhere that is potentially web accessible. That's best practice. If you're required to store stuff like payment info on your server make sure it's being encrypted with a secure key.

Now all of that said, being an early adopter of PCI standards I have to say it's nice to see someone else getting on the PCI bandwagon. Too bad the credit card companies haven't pushed that harder and started closing or threatening to close merchant accounts for those who have completely ignored PCI. There are still loads and loads of very questionable practices out there in the wild, most of which could be cured by simply forcing people to adhere to PCI.

#4 projectphp

projectphp

    Lost in Translation

  • Moderator
  • 2,203 posts
  • Location:Sydney Australia

Posted 28 July 2009 - 12:01 AM

Always have one. even if it is just plain text. Too much can go wrong otherwise.

#5 DJKay

DJKay

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 355 posts

Posted 28 July 2009 - 11:03 AM

Okay guys, I am using the robots.txt for blocking spiders from crawling pages that may be considered duplicate content, such as multiple versions of register pages. I am also blocking a folder that is all our article pdfs because we have html & pdf versions of all our articles, again to guard against content duplication. The robots.txt is also disallowing the cgi-win folder and a few other folders that are actually saas installations of customer sites.

So, from reading the pci report, it seems that the problem is the folders that are the saas installations of our product because those are the ones that could probably be hacked and thus get to the e commerce/credit card information. But I don't even understand how they could get to that because all of the credit card info is under a secure set up. But those hackers are nasty folks so hey, what do I know. It just must be the fact that the robots.txt is like a site map that they can follow.

Any way folks, what about guarding against duplicate content? Am I going to get into trouble there? DJKay

#6 DJKay

DJKay

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 355 posts

Posted 28 July 2009 - 11:05 AM

ProjectPHP--so you are saying to always have one, ..I don't understand, every robots file I have ever done is in text..just have something up there with nothing in it?

So, if you are saying I should have something, its 2 against, 1 for. DJKay

#7 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 28 July 2009 - 11:32 AM

Here's the gist of it, that'll hopefully clear up some things for you DJKay.

Having a robots.txt is just fine. Having one to use to avoid duplicates or to keep spiders away from certain areas you do not want to be indexed is fine. The caveat being that you don't want to use robots.txt to try to hide something might be sensitive customer data from anything or anybody, hackers included. robots.txt is the first place hackers look for possible exploits.

That's the reason having a robots.txt throws up the PCI warning. Because too many people do in fact use robots.txt in an attempt to hide sensitive data, instead of making sure those areas that contain sensitive data requires a secure login.

With the way you've described you're using robots.txt you're completely okay. That's 100% legitimate usage, and what robots.txt was designed to be used. That you're using it correctly however won't suppress the PCI warning, because too many use it incorrectly and haven't a clue that they're exposing their customer data to anybody and everybody who bothers to look.

#8 DJKay

DJKay

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 355 posts

Posted 28 July 2009 - 12:30 PM

Okay Randy, I understand that part, so its still 2 for removing it, 1 having something there...and no answer on the challenge with duplicate content issues that may arise as a result of removing the robots.txt.

Could use in page tags, but those from what I understand are not as effective against spiders.

Any thoughts, suggests, please weigh in at any time. DJKay




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!