If you have never struggled with duplicate content problems before, you’re probably just blissfully unaware of them. According to Google “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar.”
Now if you have a duplicate content problem, Google will pick the url it finds most relevant and keep the duplicates hidden from view or listed in the serp* randomly.
Did you know your duplicate pages will actually be competing with each other? This could also be cross-domain if you have the same content on different domains.
Having duplicate content will also cause the Google bot to crawl unnecessary many pages. For a large site it could mean 100’s of thousands of extra pages to crawl. Extra pages mean extra time before your site is crawled.
What causes duplicate content issues?
1. Mis-configuration of the web server.
Example: www.example.com and example.com will give you the same page. Therefore every page on your site will have at least two versions.
How to fix: Make a 301 redirect to either the www or non-www version. If your site is running Apache you can add a line in the .htaccess file. To see how it’s done, Google “301 redirect www to non-www” or “301 redirect non-www to www”
2. Product variant pages
If you have an e-commerce site, you probably have products available in different sizes, colors, etc., but he product title and description are still the same. This is very common and and I bet there are a zillion product pages out there that could be left out from Google’s index.
How to fix: Add the canonical link element and tell Google what page to index. To learn how to implement the canonical element, see Googles information about canonicals.
3. Sharing content between your sites
Example, you have a couple of recipe sites and and they share some of the recipes.
How to fix: Make sure you use the canonical link element to tell google what page is canonical. Even better is to rewrite your content for each site you have and make it unique to the site. Tip: use a tool such as “similar page checker” to compare your pages.
4. Copied content
If you copy content you will most likely end up with poor rankings. After the Panda update Google treat sites with poor (or copied) content with no mercy. E-commerce sites should be careful with using their suppliers product descriptions. If anyone else is selling the same product they might have the same product texts as you do.
How to fix: Do not use copied content.
5. Url variables
How often do you not see variables in the url string. A url can have variables such as: www.example.com/index.php?id=10&lang=en&sort=price
The parameter “sort” in this example just sorts the products on the page according to price. This is basically the same page as the page without the “sort” parameter. For Google the pages are the same and considered duplicates.
How to fix: Use the canonical link element pointing to the page without the “sort” variable.
How to spot duplicate content
I prefer using software such as Seomoz Pro or Website Auditor to report what pages are similar.I like Seomoz because they send out weekly reports making it hard to miss any new issues when they arrive.
Content rules! Write good content and make sure you don’t have duplicate versions. Remember, less is more!
* serp = search engine results page