Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Google Spider Yes! Competitors Not!


  • Please log in to reply
4 replies to this topic

#1 RiYo

RiYo

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 300 posts
  • Location:The Netherlands

Posted 14 April 2008 - 03:48 AM

Goodmorning I hope you can help me with the following.

I am busy with the putting together the layout for a new website. It will be an informational website, gathering the info will take a LOT of time. This information would be very useful to other competitors / companies as well.

What I want is the following: (1) Google must be able to spider the information but (2) Competitor-bots NOT.

Is there an easy way to arrange this?

Thanks a lot in advance

Richard

#2 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 14 April 2008 - 07:29 AM

Well, there are a couple of issues you'd need to account for.

First, to allow in Googlebot but keep users away you'd have to perform some type of cloaking where you deliver the actual content to Googlebot and perhaps other legitimate spiders, but not to other user agents. This is fraught with difficulties on a variety of fronts, first and foremost being that it can be construed by the engines to be a bad thing. Thus attracting closer scrutiny or potentially a penalty for trying to "trick" the search engines. Another issue being that if you rely solely on the user-agent string to cloak things, it's a relatively simple procedure for others to mask their browser or bot to appear to be Googlebot, thus getting themselves into your restricted area.

The other obvious issue is that if you allow Googlebot etal to Cache/Archive the content, it'll still be available to anybody who goes to the search engines and then review the cache version of your pages.

In other words, it's hard to do. On a couple of fronts.

What usually works best, and may in your case too, is to feed the spiders and visitors the same thing, but make this freely available content a Synopsis of the actual info you're making available. In other words, a shorter version that contains your keyword phrases enough to rank well, but not something that gives it all up. Keep the really valuable stuff hidden behind some sort of password protection to give you better control over who gets to see the real meat.

#3 RiYo

RiYo

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 300 posts
  • Location:The Netherlands

Posted 15 April 2008 - 02:15 AM

Thanks Randy!

Are there other people that deal with the same kind of problem?

As I mentioned in the first message, it will take a lot of time to gather all the info that we want to publish on our site. It is info that people are looking for we are certain, but they do not want to pay for it. We have other revenue streams, so that is not a problem. But we need to attract people to our site, so I want to have as much info on the site as possible but want to avoid that competitors just copy paste the info from our site onto their own site.

Putting a password on the site is a suggestion we are thinking off ourselves as well, but it would only be to prevent spiders of competitors to scrape our site. But I assume that when a competitor has HIS username and password he can put that into his spider and let it scrape our site ... or am I wrong here. I have no knowledge about spiders as you can tell smile.gif.

If anybody has another sollution … I would appreciate it if you can share it with me.

cheers Richard

#4 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 15 April 2008 - 06:44 AM

Well, can you identify the bad bots in your log files? Either by the IP number they originate from or possibly by what it puts in its user-agent string?

If someone is using an automated process to scrape your site content and you can identify the bad bots that way it's a fairly simple process to send them off into null space and/or block them outright. As you say, even if you set up password protection the bad bot owner might have the ability to feed it a valid user/password and still get to your content.

The other way of handling such things is waiting until after they've lifted some of your content then complaining directly to their hosting company. There's always someone upstream from these types of folks, and most hosts have a copyright infringement/DCMA policy in place these days.

#5 RiYo

RiYo

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 300 posts
  • Location:The Netherlands

Posted 15 April 2008 - 07:29 AM

txs Randy!




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users