Skip navigation
SEO Website Audit

SEO and Duplicate Content Issues That Hurt Google Traffic

December 5, 2012
             
By

As part of my SEO for 2013 and beyond series, I promised to provide more in-depth information about the "SEO killers" I mentioned last time.

Image Credit: Franco FoliniToday I'm delving into duplicate content as it relates to SEO. My SEO audits of sites that lost traffic over the past year and a half showed that duplicate content was present on most of the sites. While it's an issue that generally goes hand in hand with other SEO problems, duplicate content comes in so many forms that I found it to be the single most prevalent problem that can affect a website's success with Google. Before 2011, duplicate content used to be filtered out of the search results and that was that. However, post-Panda/Penguin, dupe content on websites can often have major repercussions.

How to Check for Duplicate Content

While there are numerous duplicate content checkers available, the simplest method is to copy a random snippet of content from various pages of a website, then wrap them in quotation marks and do a Google search. If a lot of dupes show up in the search results, you may have a Google duplicate content issue.

Another good duplicate content checker is to go to your Google Webmaster Tools account. Under "Optimization" you'll see "HTML Improvements." If you click "Duplicate Title Tags" you may learn about some duplicate content you had no idea existed on your website.

Causes of Duplicate Content

There are many reasons why a website can end up with dupe content. Sometimes it's just laziness on the part of the site owner. Other times it's an attempt to gain more keyword traffic. In many cases, however, duplicate content is simply a mistake caused by technical issues.

For instance, one website I reviewed had their entire glossary and FAQ sections duplicated because they existed in both the root directory and a directory specifically for English-language pages. This caused lots of the same content to be indexed by Google on different (but similar) URLs, such as these:

http://www.example.com/medical-glossary.html
http://www.example.com/en/medical-glossary.html

The fix for this is of course to choose only one place to house the content.

Another site I reviewed had inadvertent duplicate content because they had too many similar but slightly different categories for their products, such as these:

http://www.example.com/bear-cave-photos/2428/dept
http://www.example.com/bear-cave-photo-gifts/1290/dept

They had more than 20 different URLs that all had pretty much the same products on them. Interestingly enough, many of them were bringing direct Google traffic. While that sounds like a good thing on the surface, I believe that if all the URLs had been consolidated into one, they'd have acquired even more weight and overall Google PageRank. This in turn would have provided the one main URL with an even better chance of showing up for even more targeted keyword phrases.

Another fix for this might be to use the canonical link element -- i.e., rel=canonical -- pointing to one main page. But my first choice would always be to fix the URLs by consolidating them. (Don't forget to 301-redirect the others.)

I also found websites that had duplicate content issues simply because they used both initial capital letters and all lowercase, like these:

/people/JaneAusten
/people/janeausten

And then there was the old dupe content because the site appeared with both HTTP and HTTPS.

The commonest cause for inadvertent duplicate content such as these examples is a bad or misconfigured content management system (CMS).

I've seen CMS's that output both parameter-laden URLs along with clean URLs for the same products, such as this:

http://www.example.com/index.php?manufacturers_id=5555
http://www.example.com/brand-m-5555.html

None of this would be a problem if Google (and all search engines) did a better job of understanding that all of the above are simply technical issues. And while they can (or should) be able to, they often don't. In fact, it seems that they're allowing that sort of thing to potentially hurt a website's visibility in the search results more often today than they did many years ago. I've always said that there was no such thing as a duplicate content penalty, but today there is in fact one -- or more. My guess is that they want to encourage webmasters to do a better job of cleaning up the messes that their CMS's leave behind because that makes Google's job of crawling and indexing much easier.

Categorization Gone Crazy

Beyond technical issues, another common reason for duplicate content problems is that some products fit into multiple categories. For instance, an "Outdoor Gear" type of site may have multiple target audiences such as hikers, runners, cyclists, snowmobile riders, motorcyclists, ATV riders, etc. And some of their accessories -- backpacks, jackets, gloves, etc. -- may be of interest to several audiences. If the website is categorized by target market rather than by product type, and the products are found in each of those categories (under different URLs), that can lead to major duplicate content issues.

To fix category problems, re-categorize the site by product type (which may or may not be ideal) or ensure that no matter which category users enter from, they always end up at the same product page URL. (The page itself could explain exactly who might need the product shown.)

A Rose by Any Other Color

A similar duplicate content problem can occur when products come in various colors or sizes. If each of those different sized or colored products has its own page with the same basic description, it's certainly a duplicate content issue. But that's the way many CMS's seem to work. It would be much better for both usability and search engines if those types of products simply had one page with the option to choose sizes and colors, etc., right there on that page.

Tags That Go to Infinity and Beyond

Using WordPress as your CMS isn't always the answer to duplicate content issues either. Many sites that use it go crazy with their tagging of blog posts. New tags are made up for every new post, and each post gets tagged with a handful or more of them. What happens next is that Google indexes the zillions of tag landing pages, which either has just 1 post tagged or they have the same posts as a bunch of other tag landing pages. It's especially bad when the tag pages provide the complete blog post rather than just the first paragraph or so.

My recommendation for sorting out that kind of mess is to create only a limited number of tags that the bloggers can use -- perhaps 20 at most. If for whatever reason that's not possible and you want to use your tags for keyword stuffing (a la The Huffington Post), then be sure to add nofollow to the tag links and noindex to the tag landing pages to avoid mega duplicate content Google problems.

Once, Twice, Three Times You're Out!

Of course, some duplicate content issues are there out of plain old laziness. I've run across many sites that put some content on the home page and then repeat the same exact thing on nearly every other page. What's even worse is when there are other sites using that same content as well!

And then there are the duplicate content issues that some companies create for themselves because they develop additional mini-sites (aka doorway domains) to try to gain even more search engine listings. In other words, if you are Cruella D'ville Inc. and sell Dalmation blankets on your main website at CruellaD.com/dalmation-blankets, it's no longer a good idea to also sell the same ones at DalmatianBlankets.com -- especially if you're using the same basic content.

Don't Let Greed and Laziness Bring You Down

Another form of lazy duplicate content is what's been known for years as "Madlib Spam." One common example of this is the site that offers the same service in multiple cities. They create individual city landing pages with the same basic content, while switching out the city name. I've even seen forms of auto-generated madlib spam on product sites where they want to try to capture all sorts of long-tail keyword traffic that relates to their products. While some of the content may make sense to the reader, more often than not it is gibberish. The sad thing is that even really good sites sometimes use this technique. But in 2013 and beyond this is exactly the kind of thing that could end up bringing a good site to a grinding Google halt. Google isn't just stopping at punishing the auto-generated areas anymore -- even the good parts of a site could take a hit.

As you can see, duplicate content comes in many forms. While every website is likely to have a bit of it here and there, if your site has any of the issues mentioned here, you should block away some time and money to clear it up. Especially if you've noticed a significant loss in Google traffic at some point.

Jill

 
Jill Whalen has been an SEO Consultant and the CEO of Jill Whalen High Rankings, a Boston area SEO Company since 1995. Follow her on Twitter @JillWhalen

If you learned from this article, be sure to sign up for the High Rankings Advisor SEO Newsletter so you can be the first to receive similar articles in the future!
 
 
 
Post Comment

 Pete said:
And is it correct that the "duplicate content" issue is related to having duplicate content on your own site, not having content on your site that also appears on other sites i.e. syndicated content?

The classic syndicated content example is news syndication. An Associated Press story might appear on many local and national news sites, but these sites are not penalized for having "duplicate content". Or the Declaration of Independence. It appears on many sites without penalty to all but the original online publisher.

Syndication can be a great way to spread your content, but I publish on my own site first to get credit as the original source. And it's my understanding that creating your own original content is ideal, but it's also OK to have some content on your site that's been published elsewhere too.

I'd love to get your take on this.
 Jill Whalen said:
@Pete, duplicate content issues can be both within your own site and from your content being on other sites.

While syndicating your content is generally fine, don't be surprised if the other sites show up for searches rather than yours if they're the higher authority.

The main thing is to make sure that if your site does syndicate content (or if you have others content on your site) that there is also a substantial amount of unique content on the site as well. Or that at least the content is curated and cataloged in some way that makes it more useful to users than every other site that contains the same information.
 Lynn said:
Thank Jill, this is very helpful. My client has a multi-country site, built in Wordpress, for US and UK. Users can toggle between them. The content is very similar - some spellings differ, images, local events, but for the most part they are the same. How would you solve this problem?
 Pete said:
Great, thank you so much for pointing out that higher authority sites will likely outrank the site with the original content. When I syndicate content I always include backlinks to my site for the SEO value and hopefully to drive actual traffic.

Adding value and creating original content seem to be the big takeaways. There really are no shortcuts.
 Joseph Cassia said:
I receive press releases and calendar items every day for posting on my community site. Most of them are also distributed to multiple media outlets. They are legitimate, so how do I protect the integrity of my site?
 Jill Whalen said:
Joseph, see my above comments.
 Naveen said:
Thanks, Jill. I have benefitted from your newsletter for years. I write articles for multiple customers and sometimes find myself using the same expressions unintentionally. This is detected by Coyscape and i have to change the parts that match copy i have written earlier. Is a match of about 4 - 6% with another article treated as duplicate content by Google?
 Jill Whalen said:
Naveen, I don't know. Have you found it to be problematic?
 James Hobson said:
Very nice article. I would like to contribute a reminder that the Digital Millenium Copyright Act can be your friend for egregious third party situations. We have a client whose site content and code had been scraped. We discovered this by using Copyscape to detect DC issues. Direct communications with the offending party, including letters from an attorney, failed to get action. After months of frustration we escalated the duplicate content/copyright infringement matter by filing a DMCA report to Google (use this link >> http://www.google.com/dmca.html). Subsequently the offending site was dropped from every SERP.

The DMCA link should be used as only a last resort for dealing with duplicate content problems caused by scrapers or plagiarists but it's there if you need it.
 Pete said:
Tynt is a tool that will automatically add an attribution link to content copied off your site (like newspapers use) and they'll give you stats on the backlinks and traffic that result. However, the attribution link is easily deleted by those who don't want to give credit.

There are also WordPress plugins that will prevent copying of your content at all, including the ability to disable right-click, so they can't take code or content from the backend either. The downside of this is that legitimate sharers can't easy curate your good content.

To share or not to share is a tough call. But these tools can help if you decide to go one way or the other.
 Jill Whalen said:
For those with eCommerce sites that are basically the same as everyone else's, you should watch @MattCutts' latest video that covers this issue.

 Kevin said:
Thank you for the post!
But I would suggest using duplicate content checkers insted of Google search with quotation marks, because using the checkers you may set the auto check once a day\week\month and not to do it manually. It is crucial when you have extremely big site.
 Paul Schlegel said:
I am now hearing rumors that Google has started penalizing sites with duplicate images, such as the one you've used in this post.
 Jill Whalen said:
@Paul, that sounds kinda silly to me therefore I wouldn't believe it.
 Paul Schlegel said:
OK. It was from a post by Roy Reyer - I don't know anything about him other than he did some trainings with Jerry West. I could be misinterpreting the post, too - or like you said, it could be nonsense. Let me know if you want a link to his article.
 Jill Whalen said:
Found the article. He seems to be talking about product images on eCommerce sites. But I still think it's total BS.
 Paul Schlegel said:
Thanks for checking it out.
 Dani said:
Hi Jill. Really good post, I must say.
I'd like to ask you about "Tags That Go to Infinity and Beyond", and please notice that I'm not looking for "the" correct answer, just what you would do.
In case all those "tag pages" have been already created, what would you suggest a webmaster should do with them? Do you think it would be proper to add a "noindex" meta tag to almost all of them and start using just a smaller group of definitive tags?
 Jill Whalen said:
Dani, yes, that should work.
 Olivia Brown said:
My site is much older plus much SEO has been done for it already. Its having good content and well optimized for my primary keywords. Still it doesn't rank for those keywords. I have been doing article submission, forum posting, social bookmarking and blog commenting for my site on high ranked sites. Still some of the keywords don't even come within 100, all out of 100.

Furthermore, some keywords do rank well. But then, in the mean time, either they slowly they get out of 100 or, to my surprise, suddenly disappear, literally invisible from between 1-100. If this sudden disappearance is termed to be Google dance, then let me tell you, they don't regain their position the next day. Please help me! This is really frustrating.

Does this have to do anything about duplicate content? In a forum, people said its google dance. But had it been google dance, it would have ragained its original position in some days but it doesn't. Is there anyone here to help me?
 Jill Whalen said:
Olivia, it's impossible to say without a full site audit. Could be any or many of the SEO Killers I mentioned in the previous article.
 Andy said:
Hi Jill
What is your suggestion for offering the same service for multiple cities? We were thinking about a landing page for "landscaping" for each surrounding city that we service, sounds like that would lead to duplicate content issues. We would really like to rank on the first page for "landscaping ????" and also "landscaping ????"
Thanks in advance!
 Jill Whalen said:
@Andy, yes that would be considered "Madlib Spam."

You should simply list all the cities that you work in on a specific locations page within the site.
 Steve Cummins said:
Hi Jill,
Excellent article.
May I ask is it duplication if anchors pointing to the same link are duplicated - three or four different ways on a page to get to the one article
 Jill Whalen said:
@Steve, it's not duplication, but google may see it as trying to gain additional anchor text.
 Johnny B said:
I want to add syndicated research and charts to my site because I believe it will benefit my local clients. Is there a Panda safe way to do it? Or do I have to choose between SERP strength and serving my customers?
 Jill Whalen said:
@johnny just add them. It shouldn't cause any problems.
 matt said:
What about duplicate content on subdomains? For instance, using Drupal you can access the same article from how many different subdomains there are.
 Jill Whalen said:
Yes Matt, that is dupe content which is bad.
 Matt said:
Hey,

A really great article, I have always hated the "same service in multiple locations" one myself, I come across them all too often as with general duplicated content.

I am working on a project at the moment where the business owner has basically the same content but for different areas, have to explain why this is not a good idea though as always their say is what I will go with and I just offer the advice!
 Dave said:
Hey Jill, a super useful post! Thank you!
 Dave said:
Hi Jill.. I have my main site and I would like to have several different domain names pointing to it. (the same directory in my public_html) All of the proposed domains relate to my business,for instance, my main domain is peddlersstore, where I sell recumbent trikes. I also have peddlerstrikes pointed at the same directory with no problem. There are a couple of other domain names that I would like to do the same thing with, but I don't want penalized for doing it. Several domain names, all pointing to the same place... duplication problem?
 Jill Whalen said:
Yes, it is.
 Dave said:
That is very interesting! Especially since gogle.com, gooogle.com both direct to google.com! But they would penalize me for doing the same thing?
 Jill Whalen said:
Redirecting and having two sites are very different things. If you redirect yours it would be fine.
 Allison said:
Hey Jill,

Thanks for a great article. I've been running a site for a few years, and now that it's growing, I'm just starting to get into SEO.

One of the projects I'm working on is creating a directory of university programs. Each program has a separate entry / URL. The first half is description of the program, which is unique, and the second half is the services provided by the program and the services provided by my company. Therefore, about 1/3 of the content on the page is duplicated on each and every program's page, for probably a few hundred entries.

How should I deal with this? Wrap only the duplicate content part in a meta=noindex, follow tag? I obviously want the program summary to be indexed, as the programs themselves are unique.

Thanks for your help!
 Jill Whalen said:
Allison, the dupe info should be on a separate page that you link to from all the others.
 Gemma Tubbrit said:
We sell on own website and also on ebay and 2 other sites. However the content of the product pages is simply transferred across to the other sites, therefore this will cause duplicate content. We aren't trying to sneakily cheat Google but simply make the process simple for the staff that upload our items to the other sites. Do you think google will penalise for this?
gemma
 Jill Whalen said:
@Gemma, most likely only one of your sites will do okay in Google because of this. And it likely won't perform as well as it could if there wasn't all that other duplicate content out there.
 Gemma Tubbrit said:
As we would rather our own site did better than the other sites we sell on (we don't own these we use them to help sell our products and pay a monthly fee to the sites). How will Google know who the original author of the content is - one of the sites we sell on is massive in terms of visits, number of pages, great search results, etc and Im worried it will be assumed they are original author! I am trying to sell different products on the 2 sites (ebay has all our items) and therefore it may reduce the duplicate content a little :( Should I be editing the content on the other sites so it doesn't mirror ours too much ?
 Jill Whalen said:
Changing the content might help. But I can't really provide you with good and complete advice via comments on a blog post. It's something that would need to be carefully looked into to assess the nature of the duplication.
 Gemma Tubbrit said:
Thanks Jil
 Gamemunition said:
Hi jill,

I studied every word of yours. Thanks for the info. But i have a question. I have a blog and i want to run a section of "Press Release" and "Technology News". Now when i will be putting news on my site, then 100 of other site will also be publishing the same news.
So this can create the issue of "Duplicate Content". Now how will Google Treat this? Will it penalize me? If it is then then i just cant put in News. Because News will always be the same.
Just before your reply i want to tell you that i wont be copy pasting things. I will surely write in my own words and opinion, but still lot of text will be same.
How Google treat this scenerio?
Regards,
 Jill Whalen said:
I don't know, try it and see.
 Dane said:
Hey Jill,

great information, a great article on dupe content for sure!

I have a website and it is a regular html style website ( I know cms is probably better) however, I used to rank amazingly well for many years..then suddenly my traffic dropped to a fraction of what it used to be and I don't rank in google for my keywords anymore.

There is a penalty and it's not manual-because I checked with Google.

I have been going through with a fine tooth comb but was wondering:

I have many pages which are named things like:

www.example.com/kidsactivities1.html
www.example.com/kidsactivities2.html
www.example.com/kidsactivities3.html
www.example.com/kidsactivities4.html
www.example.com/kidsactivities5.html
etc..etc..

These pages have different title tags and different content, descriptions etc.
Could this cause a penalty in Google do you think?

I think I am missing an opportunity to add some keywords in my urls for some extra ranking as one of the ranking signals..

What are your thoughts?

I am stumped...


Thanks so much for any help..
 Jill Whalen said:
The URLs wouldn't matter as long as the content isn't the same.
 Lyn said:
I have a variation on the same-product-different-cities theme. My client has a service business in a small town. Over the past few years, the client has purchased similar owner-operated businesses in other small nearby towns. They wish to maintain their local flavour by retaining their individual business names and domain names, with separate home pages and "about" pages. But they want to share quite a bit of content related to service descriptions and customer education. Is there any way build the site so the top level "local" pages don't get clobbered because of the content they share with their associates?
 Jill Whalen said:
@Lyn, the only way to be sure would be to create unique content.
 Brian Pierce said:
Great! This answers a question I had about why my site was ranking well and now is almost missing from the SERPS! Thanks so much. I am a total novice at this. Although I have had excellent success with other sites for businesses where I worked I went and changed how I did things on my own personal business site. That was of course pretty stupid. I did the Madlib spam with my site thinking I could grab more spots on the SERP for my local service area. In fact I killed the good rankings I did have and few if any of the pages I Madlibbed even show up! Ha!
So thanks again for the information!
 Chris said:
Jill,

I have a question regarding the "MadLib Spam", if a site genuinely has multiple service cities, and you genuinely want to target each city for your services, is it possible to still create the city pages assuming each page has unique content, not auto generated content but uniquely written content?

Thanks in advance,
Chris
 Jill Whalen said:
@Chris, yes, but do you really have something different to say regarding each city's services?