Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Content Thieves?


  • Please log in to reply
21 replies to this topic

#1 iangriff

iangriff

    HR 1

  • Members
  • Pip
  • 6 posts

Posted 06 October 2004 - 12:43 PM

We have an e-commerce system for retailers that includes a significant amount of industry specific content.

I am trying to balance making product detail pages available for the spiders and making sure that competitors don't steal all the content on my web sites.

We have spent considerable effort collecting product content (images, descriptions, ratings, etc.) and naturally want the spiders to index all this and return results for our sites on these products. However, if I provide a page with links to the product detail, surely a competitor could write a routine to collect all the content on my web site.

I am looking for a way to show the spiders everything but limit everyone else. We are toying with using the spider's IP addresses to show them a page with links. Then we would kill the links for non-verified IPs, but I worry that this may be seen as spamming the search engines.

Any thoughts?

Thanks in advance,

iangriff

#2 SearchRank

SearchRank

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 2,333 posts
  • Location:Phoenix, AZ

Posted 06 October 2004 - 12:56 PM

Sounds like cloaking to me. unsure.gif

Can it get you into trouble? It could. Is it worth the risk? You'll have to work that one out with your own conscience.

#3 iangriff

iangriff

    HR 1

  • Members
  • Pip
  • 6 posts

Posted 06 October 2004 - 01:13 PM

Thanks searchrank, you have confirmed my fears.

My conscience tells me not to do it, but I fear I may give away the business, or at least the content side of the business.

Is this a circle that can't be squared or does someone know of another approach?

Thanks again, iangriff

#4 Nick W

Nick W

    HR 3

  • Active Members
  • PipPipPip
  • 67 posts

Posted 06 October 2004 - 01:22 PM

If you dont want to cloak, you're just going to have to do what the rest of use do, monitor your competitors sites...

Nick

#5 Googlewhacked

Googlewhacked

    Got geek?

  • Active Members
  • PipPipPipPipPip
  • 348 posts
  • Location:Florida: The Plywood State

Posted 06 October 2004 - 01:24 PM

Ian,

Ditto what SearchRank said. Any kind of content that is only available to the spiders can be construed as cloaking (even if it is done with the best of intentions) & can come back to bite you.

As an alternative to "agent filtering", why not try putting a transparent layer over the area where the text is on the page(s)? This will prevent basic copy-n-paste style copying.

Also, while not a real "solution", there are relatively easy ways to check for content being duplicated, many of which have been discussed in these forums. You can use these methods to find offending sites & take the appropriate steps.

Ultimately, no content is *truly* safe, as there are free/cheap & easy to use utilities which can bypass virtually any copy-protection. The only real defense against plagarism is your own vigilance in checking for it.

Phil

#6 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,316 posts

Posted 06 October 2004 - 01:31 PM

Welcome iangriff! bye1.gif

I'm a little confused.

You've collected all this great content, but won't put it on your site and only want the engines to see it? It just doesn't make sense to me.

If it's relevant content about your products and stuff, not showing it to your users is a crime in and of itself (not legally of course!).

I think there's more to your question than you're letting on. Care to share?

#7 lcobb

lcobb

    HR 3

  • Active Members
  • PipPipPip
  • 86 posts

Posted 06 October 2004 - 01:43 PM

I have seen pdf documents that were locked down pretty tight. You could not print or copy the contents. I do not know if, when you lock down the documents, the search engines would be able to index the documents.

Larry

#8 iangriff

iangriff

    HR 1

  • Members
  • Pip
  • 6 posts

Posted 06 October 2004 - 01:56 PM

Hi Jill,

Thanks for delving deeper. Let me try to be clearer.

I am supporting a series of sites that have dynamically driven product pages and that share access to the same content. In some cases this can include more than 20,000 products and a lot of content.

I have a link on the bottom of each page for the spiders to show them the products on the site, so they can index the product detail pages. But doing this risks the work we've put in to collecting the content on 20,000+ products.

Sure I want the site users to find the product pages and see the content, but hopefully they'll use the site's navigation and search rather than the index page.

So the question seems to be how to show the spiders dynamic content without making it easily accessible to someone who wants to steal it all. I am less concerned about someone manually copying and pasting, but writing a quick routine to download it all doesn't seem fair.

Thanks, iangriff

#9 chrishirst

chrishirst

    A not so moderate moderator.

  • Moderator
  • 5,881 posts
  • Location:Blackpool UK

Posted 06 October 2004 - 01:57 PM

There's one aspect that appears to have been overlooked in the scenario of allowing the crawlers in freely, and blocking unregistered visitors.

That is, what's to stop your competitors grabbing the content from the SE cache?

I would consider a solution where you could extract snippets from the articles/documents and have these crawlable and visible to unregistered visitors but the full documents are behind a password/cookie protected area.

#10 jason_and_kelly

jason_and_kelly

    HR 2

  • Members
  • PipPip
  • 14 posts

Posted 06 October 2004 - 02:47 PM

On a philosophical note, the SEs are looking at your site to see what is available for the public at-large to search. If your precious content is too valuable to offer unrestricted to the World Wide Web, then why should it be indexed?

Either your content is available for the public or it is not.

That said I empathize with your situation.

#11 Papadoc

Papadoc

    HR 2

  • Active Members
  • PipPip
  • 42 posts
  • Location:Charlotte NC

Posted 06 October 2004 - 03:21 PM

Bottom line, if you make it available to an SE, you make it available to anyone else unless you are cloaking. Tracking it isn't necessarily all that easy either. You can take an unusual string and pop it into the Google search bar to come up with exact matches, but if someone modifies the text some through a minor rewrite, they could still have your info but have swapped out one phrase for another.

A couple of tricks though to make it a bit harder:

Search for "disableselect" to come up with many different JS that will stop a simple copy/paste. It's a bit archaic and won't stop someone with any real knowledge, but it has also been known to just annoy someone enough that they don't bother. Neither will it help if someone downloads your whole site. Be sure to look for one that includes script for more than one browser. Most are only compatible with IE.

A JS that disables right click will help to prevent someone from swiping images. What you risk here is annoying anyone that might want to right click to open a link in a new window. For images that you create, consider adding a file comment. Most people don't go looking for comments on images. Right click on the file, go to properties and then go to summary. You can place a lot of information here but be sure that you also use your full URL along with the name of your company.

One cool little trick to proving text is yours, is to bury comment code that has your URL multiple times in the middle of words in the text. I don't think that I would do that with keywords, but putting it in the middle of stop words won't disrupt the viewer and should not have any effect on SE indexing. That way, if someone does download the content and then merely copies it, the comment code will go with it unless they copy the text to a text editor first. When you find your content on someone else's site, contact their host. They can deny it or claim that you stole THEIR stuff, but they are going to have a hard time explaining why comment code with your URL is on their site. It's a nice little trick that I picked up from studying spam that gets through. I saw one recently that was quite brutal: "If you are seeing this code on any other domain other than whateverdomain.com, the owner of the site is a thief. Don't do business with them. They are a dishonorable cheat and they will steal from you." OUCH! Imagine finding this on your own site and since you published it, you cannot even go after them for damaging your reputation.

I cannot say how it's done, but I have seen sites that write session limitations. So a given IP address may only be able to view a max of 50 or 100 pages in any given 24 hour period. Then if you are looking in your stats and see that over a couple of days, one single IP is responsible for 300 (or whatever) page hits, disable their IP from accessing anymore pages permanently. You would have to have some very valuable content and a very determined competitor to try and get around that kind of security. The important thing would be to exclude the IP addresses of the known SE bots from that rule or they wouldn't be able to go any deeper either. It would take some study, but I've seen it done. Also be sure that you do a reverse on every IP before you ban it as you don't want to accidentally ban Google for indexing your whole site.

#12 Hyperformance

Hyperformance

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 634 posts
  • Location:Chicago, Illinois

Posted 06 October 2004 - 03:37 PM

bye1.gif

You could also try something that is less risky but may turn some of them away. It's called copyscape.com and their little logo's state that "this page should not be copied and is protected by Copyscape..."

Just a thought, I have been trying this myself to see if it will cut back on people taking copyrighted material without thinking about it. It could be a deterent?

- Scott

#13 andromeda

andromeda

    HR 2

  • Active Members
  • PipPip
  • 26 posts
  • Location:Oh Canada

Posted 06 October 2004 - 03:56 PM

QUOTE
A JS that disables right click will help to prevent someone from swiping images. What you risk here is annoying anyone that might want to right click to open a link in a new window.


FWIW, I instantly click away from those sites that do not permit me to open a link in a new window. I like to have dozens of windows open to compare products, when I'm shopping - and it's just not worth the hassle. Even worse, though, is when I right click on the page and get a little pop up that tells me not to steal text / images / whatever.

QUOTE
Just a thought, I have been trying this myself to see if it will cut back on people taking copyrighted material without thinking about it. It could be a deterent?


You've had people regularly plagiarizing text before? Have you had it happen often enough so that you can track to see if it is actually helping?

#14 Hyperformance

Hyperformance

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 634 posts
  • Location:Chicago, Illinois

Posted 06 October 2004 - 04:08 PM

Yes I have,

I hope to be able to watch it closer now. It is surprising but has been happening monthly. Every month I have been able to find sites (more than one) that have outright stolen my content, my paragraphs, right down to my comma's -

I am still fighting a NY site that has taken 4 pages of my content on Hosting, Domains, Fees etc. plus my entire Glossary of Internet Terms - I have also gone through Google's paperwork - so far no action there...

You will be surprised at how many people - others in this Industry - who just take your content. It can be maddening - it's not flattering. The articles I make available for reproduction are clearly stated - some think this makes all my content free for their use.

- SS

#15 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,316 posts

Posted 06 October 2004 - 04:30 PM

Yes, copyright theft is at a huge high right now.

Check nearly any member's site here at the forum in CopyScape and you'll find that some of it may be duplicated elsewhere.

Check your OWN site...you may be suprised.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users