Are you a Google Analytics enthusiast?
More SEO Content
Posted 06 October 2004 - 12:43 PM
I am trying to balance making product detail pages available for the spiders and making sure that competitors don't steal all the content on my web sites.
We have spent considerable effort collecting product content (images, descriptions, ratings, etc.) and naturally want the spiders to index all this and return results for our sites on these products. However, if I provide a page with links to the product detail, surely a competitor could write a routine to collect all the content on my web site.
I am looking for a way to show the spiders everything but limit everyone else. We are toying with using the spider's IP addresses to show them a page with links. Then we would kill the links for non-verified IPs, but I worry that this may be seen as spamming the search engines.
Thanks in advance,
Posted 06 October 2004 - 12:56 PM
Can it get you into trouble? It could. Is it worth the risk? You'll have to work that one out with your own conscience.
Posted 06 October 2004 - 01:13 PM
My conscience tells me not to do it, but I fear I may give away the business, or at least the content side of the business.
Is this a circle that can't be squared or does someone know of another approach?
Thanks again, iangriff
Posted 06 October 2004 - 01:22 PM
Posted 06 October 2004 - 01:24 PM
Ditto what SearchRank said. Any kind of content that is only available to the spiders can be construed as cloaking (even if it is done with the best of intentions) & can come back to bite you.
As an alternative to "agent filtering", why not try putting a transparent layer over the area where the text is on the page(s)? This will prevent basic copy-n-paste style copying.
Also, while not a real "solution", there are relatively easy ways to check for content being duplicated, many of which have been discussed in these forums. You can use these methods to find offending sites & take the appropriate steps.
Ultimately, no content is *truly* safe, as there are free/cheap & easy to use utilities which can bypass virtually any copy-protection. The only real defense against plagarism is your own vigilance in checking for it.
Posted 06 October 2004 - 01:31 PM
I'm a little confused.
You've collected all this great content, but won't put it on your site and only want the engines to see it? It just doesn't make sense to me.
If it's relevant content about your products and stuff, not showing it to your users is a crime in and of itself (not legally of course!).
I think there's more to your question than you're letting on. Care to share?
Posted 06 October 2004 - 01:43 PM
Posted 06 October 2004 - 01:56 PM
Thanks for delving deeper. Let me try to be clearer.
I am supporting a series of sites that have dynamically driven product pages and that share access to the same content. In some cases this can include more than 20,000 products and a lot of content.
I have a link on the bottom of each page for the spiders to show them the products on the site, so they can index the product detail pages. But doing this risks the work we've put in to collecting the content on 20,000+ products.
Sure I want the site users to find the product pages and see the content, but hopefully they'll use the site's navigation and search rather than the index page.
So the question seems to be how to show the spiders dynamic content without making it easily accessible to someone who wants to steal it all. I am less concerned about someone manually copying and pasting, but writing a quick routine to download it all doesn't seem fair.
Posted 06 October 2004 - 01:57 PM
That is, what's to stop your competitors grabbing the content from the SE cache?
I would consider a solution where you could extract snippets from the articles/documents and have these crawlable and visible to unregistered visitors but the full documents are behind a password/cookie protected area.
Posted 06 October 2004 - 02:47 PM
Either your content is available for the public or it is not.
That said I empathize with your situation.
Posted 06 October 2004 - 03:21 PM
A couple of tricks though to make it a bit harder:
Search for "disableselect" to come up with many different JS that will stop a simple copy/paste. It's a bit archaic and won't stop someone with any real knowledge, but it has also been known to just annoy someone enough that they don't bother. Neither will it help if someone downloads your whole site. Be sure to look for one that includes script for more than one browser. Most are only compatible with IE.
A JS that disables right click will help to prevent someone from swiping images. What you risk here is annoying anyone that might want to right click to open a link in a new window. For images that you create, consider adding a file comment. Most people don't go looking for comments on images. Right click on the file, go to properties and then go to summary. You can place a lot of information here but be sure that you also use your full URL along with the name of your company.
One cool little trick to proving text is yours, is to bury comment code that has your URL multiple times in the middle of words in the text. I don't think that I would do that with keywords, but putting it in the middle of stop words won't disrupt the viewer and should not have any effect on SE indexing. That way, if someone does download the content and then merely copies it, the comment code will go with it unless they copy the text to a text editor first. When you find your content on someone else's site, contact their host. They can deny it or claim that you stole THEIR stuff, but they are going to have a hard time explaining why comment code with your URL is on their site. It's a nice little trick that I picked up from studying spam that gets through. I saw one recently that was quite brutal: "If you are seeing this code on any other domain other than whateverdomain.com, the owner of the site is a thief. Don't do business with them. They are a dishonorable cheat and they will steal from you." OUCH! Imagine finding this on your own site and since you published it, you cannot even go after them for damaging your reputation.
I cannot say how it's done, but I have seen sites that write session limitations. So a given IP address may only be able to view a max of 50 or 100 pages in any given 24 hour period. Then if you are looking in your stats and see that over a couple of days, one single IP is responsible for 300 (or whatever) page hits, disable their IP from accessing anymore pages permanently. You would have to have some very valuable content and a very determined competitor to try and get around that kind of security. The important thing would be to exclude the IP addresses of the known SE bots from that rule or they wouldn't be able to go any deeper either. It would take some study, but I've seen it done. Also be sure that you do a reverse on every IP before you ban it as you don't want to accidentally ban Google for indexing your whole site.
Posted 06 October 2004 - 03:37 PM
You could also try something that is less risky but may turn some of them away. It's called copyscape.com and their little logo's state that "this page should not be copied and is protected by Copyscape..."
Just a thought, I have been trying this myself to see if it will cut back on people taking copyrighted material without thinking about it. It could be a deterent?
Posted 06 October 2004 - 03:56 PM
FWIW, I instantly click away from those sites that do not permit me to open a link in a new window. I like to have dozens of windows open to compare products, when I'm shopping - and it's just not worth the hassle. Even worse, though, is when I right click on the page and get a little pop up that tells me not to steal text / images / whatever.
You've had people regularly plagiarizing text before? Have you had it happen often enough so that you can track to see if it is actually helping?
Posted 06 October 2004 - 04:08 PM
I hope to be able to watch it closer now. It is surprising but has been happening monthly. Every month I have been able to find sites (more than one) that have outright stolen my content, my paragraphs, right down to my comma's -
I am still fighting a NY site that has taken 4 pages of my content on Hosting, Domains, Fees etc. plus my entire Glossary of Internet Terms - I have also gone through Google's paperwork - so far no action there...
You will be surprised at how many people - others in this Industry - who just take your content. It can be maddening - it's not flattering. The articles I make available for reproduction are clearly stated - some think this makes all my content free for their use.
Posted 06 October 2004 - 04:30 PM
Check nearly any member's site here at the forum in CopyScape and you'll find that some of it may be duplicated elsewhere.
Check your OWN site...you may be suprised.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users