High Rankings Search Engine Optimization ForumHigh Rankings Advisor Search Marketing Newsletter

Welcome Guest ( Log In | Register )

Important Announcement: ***Need an Affordable SEO Website Review?***
2 Pages V   1 2 >  
Reply to this topicStart new topic
> Content Thieves?, Protect content while allowing spiders
iangriff
post Oct 6 2004, 12:43 PM
Post #1


HR 1
*

Group: Members
Posts: 6
Joined: 23-January 04
User's local time:
Feb 9 2010, 02:44 PM
Member No.: 2,052



We have an e-commerce system for retailers that includes a significant amount of industry specific content.

I am trying to balance making product detail pages available for the spiders and making sure that competitors don't steal all the content on my web sites.

We have spent considerable effort collecting product content (images, descriptions, ratings, etc.) and naturally want the spiders to index all this and return results for our sites on these products. However, if I provide a page with links to the product detail, surely a competitor could write a routine to collect all the content on my web site.

I am looking for a way to show the spiders everything but limit everyone else. We are toying with using the spider's IP addresses to show them a page with links. Then we would kill the links for non-verified IPs, but I worry that this may be seen as spamming the search engines.

Any thoughts?

Thanks in advance,

iangriff
Go to the top of the page
 
+Quote Post
SearchRank
post Oct 6 2004, 12:56 PM
Post #2


HR 7
*******

Group: Active Members
Posts: 2,333
Joined: 13-August 03
User's local time:
Feb 9 2010, 11:44 AM
From: Phoenix, AZ
Member No.: 501



Sounds like cloaking to me. (IMG:http://www.highrankings.com/forum/style_emoticons/default/unsure.gif)

Can it get you into trouble? It could. Is it worth the risk? You'll have to work that one out with your own conscience.
Go to the top of the page
 
+Quote Post
iangriff
post Oct 6 2004, 01:13 PM
Post #3


HR 1
*

Group: Members
Posts: 6
Joined: 23-January 04
User's local time:
Feb 9 2010, 02:44 PM
Member No.: 2,052



Thanks searchrank, you have confirmed my fears.

My conscience tells me not to do it, but I fear I may give away the business, or at least the content side of the business.

Is this a circle that can't be squared or does someone know of another approach?

Thanks again, iangriff
Go to the top of the page
 
+Quote Post
Nick W
post Oct 6 2004, 01:22 PM
Post #4


HR 3
***

Group: Active Members
Posts: 67
Joined: 2-October 04
User's local time:
Feb 9 2010, 02:44 PM
Member No.: 5,256



If you dont want to cloak, you're just going to have to do what the rest of use do, monitor your competitors sites...

Nick
Go to the top of the page
 
+Quote Post
Googlewhacked
post Oct 6 2004, 01:24 PM
Post #5


Got geek?
*****

Group: Active Members
Posts: 348
Joined: 8-August 03
User's local time:
Feb 9 2010, 02:44 PM
From: Florida: The Plywood State
Member No.: 439



Ian,

Ditto what SearchRank said. Any kind of content that is only available to the spiders can be construed as cloaking (even if it is done with the best of intentions) & can come back to bite you.

As an alternative to "agent filtering", why not try putting a transparent layer over the area where the text is on the page(s)? This will prevent basic copy-n-paste style copying.

Also, while not a real "solution", there are relatively easy ways to check for content being duplicated, many of which have been discussed in these forums. You can use these methods to find offending sites & take the appropriate steps.

Ultimately, no content is *truly* safe, as there are free/cheap & easy to use utilities which can bypass virtually any copy-protection. The only real defense against plagarism is your own vigilance in checking for it.

Phil
Go to the top of the page
 
+Quote Post
Jill
post Oct 6 2004, 01:31 PM
Post #6


High Rankings Advisor
Group Icon

Group: Admin
Posts: 29,201
Joined: 21-July 03
User's local time:
Feb 9 2010, 01:44 PM
From: Ashland, MA
Member No.: 2



Welcome iangriff! (IMG:http://www.highrankings.com/forum/style_emoticons/default/bye1.gif)

I'm a little confused.

You've collected all this great content, but won't put it on your site and only want the engines to see it? It just doesn't make sense to me.

If it's relevant content about your products and stuff, not showing it to your users is a crime in and of itself (not legally of course!).

I think there's more to your question than you're letting on. Care to share?
Go to the top of the page
 
+Quote Post
lcobb
post Oct 6 2004, 01:43 PM
Post #7


HR 3
***

Group: Active Members
Posts: 86
Joined: 22-August 03
User's local time:
Feb 9 2010, 02:44 PM
Member No.: 611



I have seen pdf documents that were locked down pretty tight. You could not print or copy the contents. I do not know if, when you lock down the documents, the search engines would be able to index the documents.

Larry
Go to the top of the page
 
+Quote Post
iangriff
post Oct 6 2004, 01:56 PM
Post #8


HR 1
*

Group: Members
Posts: 6
Joined: 23-January 04
User's local time:
Feb 9 2010, 02:44 PM
Member No.: 2,052



Hi Jill,

Thanks for delving deeper. Let me try to be clearer.

I am supporting a series of sites that have dynamically driven product pages and that share access to the same content. In some cases this can include more than 20,000 products and a lot of content.

I have a link on the bottom of each page for the spiders to show them the products on the site, so they can index the product detail pages. But doing this risks the work we've put in to collecting the content on 20,000+ products.

Sure I want the site users to find the product pages and see the content, but hopefully they'll use the site's navigation and search rather than the index page.

So the question seems to be how to show the spiders dynamic content without making it easily accessible to someone who wants to steal it all. I am less concerned about someone manually copying and pasting, but writing a quick routine to download it all doesn't seem fair.

Thanks, iangriff
Go to the top of the page
 
+Quote Post
chrishirst
post Oct 6 2004, 01:57 PM
Post #9


HR 9
Group Icon

Group: Moderator
Posts: 4,356
Joined: 13-August 03
User's local time:
Feb 9 2010, 06:44 PM
From: Blackpool UK
Member No.: 492



There's one aspect that appears to have been overlooked in the scenario of allowing the crawlers in freely, and blocking unregistered visitors.

That is, what's to stop your competitors grabbing the content from the SE cache?

I would consider a solution where you could extract snippets from the articles/documents and have these crawlable and visible to unregistered visitors but the full documents are behind a password/cookie protected area.
Go to the top of the page
 
+Quote Post
jason_and_kelly
post Oct 6 2004, 02:47 PM
Post #10


HR 2
**

Group: Members
Posts: 14
Joined: 29-September 04
User's local time:
Feb 9 2010, 03:44 PM
Member No.: 5,222



On a philosophical note, the SEs are looking at your site to see what is available for the public at-large to search. If your precious content is too valuable to offer unrestricted to the World Wide Web, then why should it be indexed?

Either your content is available for the public or it is not.

That said I empathize with your situation.
Go to the top of the page
 
+Quote Post
Papadoc
post Oct 6 2004, 03:21 PM
Post #11


HR 2
**

Group: Active Members
Posts: 42
Joined: 17-June 04
User's local time:
Feb 9 2010, 02:44 PM
From: Charlotte NC
Member No.: 3,980



Bottom line, if you make it available to an SE, you make it available to anyone else unless you are cloaking. Tracking it isn't necessarily all that easy either. You can take an unusual string and pop it into the Google search bar to come up with exact matches, but if someone modifies the text some through a minor rewrite, they could still have your info but have swapped out one phrase for another.

A couple of tricks though to make it a bit harder:

Search for "disableselect" to come up with many different JS that will stop a simple copy/paste. It's a bit archaic and won't stop someone with any real knowledge, but it has also been known to just annoy someone enough that they don't bother. Neither will it help if someone downloads your whole site. Be sure to look for one that includes script for more than one browser. Most are only compatible with IE.

A JS that disables right click will help to prevent someone from swiping images. What you risk here is annoying anyone that might want to right click to open a link in a new window. For images that you create, consider adding a file comment. Most people don't go looking for comments on images. Right click on the file, go to properties and then go to summary. You can place a lot of information here but be sure that you also use your full URL along with the name of your company.

One cool little trick to proving text is yours, is to bury comment code that has your URL multiple times in the middle of words in the text. I don't think that I would do that with keywords, but putting it in the middle of stop words won't disrupt the viewer and should not have any effect on SE indexing. That way, if someone does download the content and then merely copies it, the comment code will go with it unless they copy the text to a text editor first. When you find your content on someone else's site, contact their host. They can deny it or claim that you stole THEIR stuff, but they are going to have a hard time explaining why comment code with your URL is on their site. It's a nice little trick that I picked up from studying spam that gets through. I saw one recently that was quite brutal: "If you are seeing this code on any other domain other than whateverdomain.com, the owner of the site is a thief. Don't do business with them. They are a dishonorable cheat and they will steal from you." OUCH! Imagine finding this on your own site and since you published it, you cannot even go after them for damaging your reputation.

I cannot say how it's done, but I have seen sites that write session limitations. So a given IP address may only be able to view a max of 50 or 100 pages in any given 24 hour period. Then if you are looking in your stats and see that over a couple of days, one single IP is responsible for 300 (or whatever) page hits, disable their IP from accessing anymore pages permanently. You would have to have some very valuable content and a very determined competitor to try and get around that kind of security. The important thing would be to exclude the IP addresses of the known SE bots from that rule or they wouldn't be able to go any deeper either. It would take some study, but I've seen it done. Also be sure that you do a reverse on every IP before you ban it as you don't want to accidentally ban Google for indexing your whole site.
Go to the top of the page
 
+Quote Post
Hyperformance
post Oct 6 2004, 03:37 PM
Post #12


HR 6
******

Group: Active Members
Posts: 634
Joined: 19-July 04
User's local time:
Feb 9 2010, 01:44 PM
From: Chicago, Illinois
Member No.: 4,420



(IMG:http://www.highrankings.com/forum/style_emoticons/default/bye1.gif)

You could also try something that is less risky but may turn some of them away. It's called copyscape.com and their little logo's state that "this page should not be copied and is protected by Copyscape..."

Just a thought, I have been trying this myself to see if it will cut back on people taking copyrighted material without thinking about it. It could be a deterent?

- Scott
Go to the top of the page
 
+Quote Post
andromeda
post Oct 6 2004, 03:56 PM
Post #13


HR 2
**

Group: Active Members
Posts: 26
Joined: 8-September 04
User's local time:
Feb 9 2010, 02:44 PM
From: Oh Canada
Member No.: 4,977



QUOTE
A JS that disables right click will help to prevent someone from swiping images. What you risk here is annoying anyone that might want to right click to open a link in a new window.


FWIW, I instantly click away from those sites that do not permit me to open a link in a new window. I like to have dozens of windows open to compare products, when I'm shopping - and it's just not worth the hassle. Even worse, though, is when I right click on the page and get a little pop up that tells me not to steal text / images / whatever.

QUOTE
Just a thought, I have been trying this myself to see if it will cut back on people taking copyrighted material without thinking about it. It could be a deterent?


You've had people regularly plagiarizing text before? Have you had it happen often enough so that you can track to see if it is actually helping?
Go to the top of the page
 
+Quote Post
Hyperformance
post Oct 6 2004, 04:08 PM
Post #14


HR 6
******

Group: Active Members
Posts: 634
Joined: 19-July 04
User's local time:
Feb 9 2010, 01:44 PM
From: Chicago, Illinois
Member No.: 4,420



Yes I have,

I hope to be able to watch it closer now. It is surprising but has been happening monthly. Every month I have been able to find sites (more than one) that have outright stolen my content, my paragraphs, right down to my comma's -

I am still fighting a NY site that has taken 4 pages of my content on Hosting, Domains, Fees etc. plus my entire Glossary of Internet Terms - I have also gone through Google's paperwork - so far no action there...

You will be surprised at how many people - others in this Industry - who just take your content. It can be maddening - it's not flattering. The articles I make available for reproduction are clearly stated - some think this makes all my content free for their use.

- SS
Go to the top of the page
 
+Quote Post
Jill
post Oct 6 2004, 04:30 PM
Post #15


High Rankings Advisor
Group Icon

Group: Admin
Posts: 29,201
Joined: 21-July 03
User's local time:
Feb 9 2010, 01:44 PM
From: Ashland, MA
Member No.: 2



Yes, copyright theft is at a huge high right now.

Check nearly any member's site here at the forum in CopyScape and you'll find that some of it may be duplicated elsewhere.

Check your OWN site...you may be suprised.
Go to the top of the page
 
+Quote Post

2 Pages V   1 2 >   
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



This forum is sponsored by High Rankings, a Boston SEO Agency
- Lo-Fi Version Time is now: 9th February 2010 - 01:44 PM