One person's SPAM is another person's syndication. With all the sites that I own or manage I've seen no problems with duplication content, and perhaps this is because I don't abuse the practice. Duplicate sites, yes, but the level at which you can have content duplicated across sites is quite high. I made about 12 copies of this site,
[link removed per [url=http://www.highrankings.com/forum/index.php?act=boardrules]Forum Rules[/url]] (mine) and added it to a number of sites that I own. I did change the template to match each site, but the content including the metags was the same. It worked quite well and the only reason I stopped was I could not make enough time to keep all the copies updated.
This scrapping and stealing stuff does work, and without human reviewers, I don't see it being caught and removed.
The other question after you find that it's happened to you is what to do about it. I have a site that has about 9 million expired domain names in HTML (really!) and is updated daily. About a year ago I did a search for some domain and discovered a few other sites with the same content. One of them seemed to be very much like the format I use. So I "seeded" the content for a while. Sure enough, some bogus domains turned up on this guy's site. I tried contacting him... no response. I found his other sites and tried contacting him, no response.
I'm too stubborn to give up, I mean, I wouldn't have minded, but the jerk would not talk to me. All I wanted was a link to my site, ok? So what I did is figured out what his IP address was from my logs. I saw his pattern for visiting my site. Once I knew his "habits", I started updating my site everyday with "poisoned" content. The first 50 or so domains or so were good records, but the bulk of the pages had extra random characters inserted. I took domains like this:
site-computers.net
and injected some characters to make them look like this:
siteh-computers.net
[Removed examples]
When you look at a list of a few thousand domain names, this small change gets hard to spot unless you are checking the names. There are so many that are misspelled or foreign that it's really easy to miss.
So after the lists were up for a day or two, I would check the logs and see that they were downloaded by him. Then I would replace the lists with the correct ones. This went on for a couple of months and he never caught on and had nearly worthless lists, because I had ruined most English language words in the first part of the domains.
After a while I got tired of this and just blocked his IP from my site.
Now I have coded markers in my data. If I find one of them on another site, I can quickly pin-point when the file was downloaded and get their IP address, but this has not happened again so far.
Finding ways to effectively code your data to have proof that it is yours and that it was taken directly by a specific person is crucial in my opinion, if you really want to protect yourself and have documentation to confront someone with, or contact their ISP.
Thanks!
Edited by Jill, 10 March 2006 - 12:11 AM.