Handling Duplicate Content

It’s almost inevitable that at some point you’ll end up dealing with duplicate content. This can lead to issues such as cannibalization and not knowing which URL to preserve. We’ll walk you through the steps needed to solve this problem.

Duplicate Content Causes 

Duplicate content is caused by a variety of factors. These are the most common causes that tend to occur:

  • Uppercase/lowercase URL variations
  • WWW and non-WWW variations
  • Trailing slash and non-trailing slash variations
  • Parameter (?UTM etc.) tag variations

While these are the most common cases, there are a few more uncommon variations you may encounter as well. These include:

  • Content that is copy and pasted to a new URL
    • Page.html, page-1.html
  • Product pages living in multiple directories
  • HTTP and HTTPS URLs

The reason you don’t often see the uncommon cases is because these are primarily already handled by proper redirects and canonical URLs. They still may come up, but they have become rarer as time has gone on. In the case of content that is copy and pasted, that takes a deliberate act, so it’s rare you’d see that at scale unless you’re syndicated.

How Do I Fix Duplicate Content (Basic)?

At the most basic level, you can solve these problems one of two ways. You can either:

  • Setup a 301 Redirect
  • Setup Canonical tags

Redirect Method – Servers

For most issues, redirecting is the best method to take care of duplicate content. Before we address the fix, you should know that the main time you may not want to use a redirect is with URL parameters. In those cases it’s better to canonicalize those URLs unless the parameter is dead.

Redirects are ultimately a fairly straightforward task to solve for duplicate content. You can setup an easy redirect rule for Apache and NGINX servers to handle lower/upper case, www/non-www, trailing slash, and HTTP/HTTPS redirects. On a different server? Google “SERVERTYPE trailing slash, upper/lower case, etc. redirect”.

Note: You can also use DNS to setup a WWW/non-WWW redirect.

  • Apache: https://httpd.apache.org/docs/2.4/mod/mod_dir.html
  • NGINX: https://www.nginx.com/blog/creating-nginx-rewrite-rules/
  • Microsoft IIS: https://docs.microsoft.com/en-us/iis/configuration/system.webserver/httpredirect/

Redirect Method – CMS

If you’re going the CMS route (potentially because you don’t server access), you may find plugins in the wild. We’ve listed a number of helpful links below but feel free to look around. As mentioned above, we recommend googling “CMS redirect plugin/ redirect trailing slash, etc.” to find specific CMS solutions to redirects.

Canonical Method

First, you want to know what a canonical tag is. A canonical tag tells Google what page is considered to be the primary page. For instance, if you have https://example.com/non-canonical and https://example.com/canonical and they both have the same content, you can tell Google that the main version is the https://example.com/canonical page. You do this by setting the canonical tag to the main /canonical tag. The code on the /non-canonical page would look like this: <link rel=”canonical” href=”https://example.com/canonical” />.

Note: Canonical tags should always be set to absolute URLs (https://example.com/canonical), not relative URLs (/canonical)

Keep in mind that this is a hint to Google. They may still choose to disregard it, but we usually see them honor canonical tags if everything else is set up correctly (i.e. sitemaps point to canonical URLs, internal linking points to canonical tags etc.).

Note: Google not honoring your canonical tags? Check out our article on how to diagnose the issue.

You’ll primarily want to use canonicals with parameters and duplicate pages living in duplicate sub-directories (most common with e-commerce platforms and products). You can use this method for the other duplicate content issues, but we don’t prefer it. Usually the choice comes down to what your developers are willing to do.

With canonicals, you’ll want to make sure to choose one version of a URL (so either with a trailing slash or without), and in the code of the page set it up to always give that version if the URL doesn’t match. For URL parameters, this is much simpler. Anything from the ? and beyond gets erased from the canonical tag, setting it to the base version of the URL.

Talk with your developers, tell them what you want done and they should be able to handle it as long as you tell them exactly what page should always be the canonical version.

Note: Having issues with communicating with your developers? Check out our guide on building strong developer relationships.

How Do I Determine Which Content Should be Primary (Advanced)?

Knowing which action to take (redirects or canonicals) is important, but the most important thing is to determine which URL should be made the primary URL. In some cases this is fairly easy to determine. For example:

  • HTTP URLs should always redirect to HTTPS URLs
  • For trailing slash/ non-trailing slash, www/non-www and upper/lowercase URLs, select the primary based on the initial setup or what is most often linked to internally (or even via backlinks).
    • For upper and lower case, lower case is the standard.
  • Parameters should almost always canonical to the base URL
    • The only time this differs is if you are using filters for pseudo categories where content changes enough to no longer be duplicate.

That said, you should also check a few metrics to guide your choice. We recommend that you look at:

  • Which URLs are ranking (Check Google Search Console, SEMRush, Moz, Ahrefs etc.)
  • Which URLs are getting organic traffic (Check Analytics data and Google Search Console)
  • What is Google selecting as the canonical URL (Use Google Search Console’s URL Inspection Tool to identify which URL Google is selecting for canonical
  • Which URLs are primarily getting backlinks (Use Majestic or Ahrefs. Moz also works)
  • Which URLs are you primarily linking too. (Use a crawler or visual inspection).

You’ll want to choose the URL that is doing the best via the above (organic traffic, backlinks, rankings etc.) as your primary URL. This is because you don’t want to mess with a good thing, and even setting up the proper tags and redirects to the lesser URL may result in ranking or traffic loss.

Keep in mind that in most cases like upper and lower case, www/non-www and trailing slash/non-trailing slash, you’re typically safe to make a decision without all of that data. This becomes more important when you may have an even mixture of URLs internally linked, or when you have duplicate content on a few pages due to copy and pasting etc.

Conclusion

Once you decide which URL structure to stick with and you’ve set up your redirects or canonicals, you’ll want to go about the process of updating your internal links to the primary version. Make sure you tackle links on your primary pages and your navigation first. After that you can go down to the other pages as needed.

Similar Posts