Got any question:

Talk to an Expert +2348071275204

Mail to info(at)semoladigital.com

Rel=Canonical: Everything You Need To Know About URLs Canonicalization

Rel=Canonical: Everything You Need To Know About URLs Canonicalization – Semola Digital (semoladigital.com)
In: SEO Guides, Technical SEO

Duplicate content is an everyday issue that most webmasters and SEOs have to deal with.

For web admins, the question is how do I deal with duplicate content – and, for SEO practitioners, it isn’t how to deal with… it’s about when.

In 2009, the search engine giants formed a combo and came up with a ‘fancy’ HTML attribute rel= “canonical” to handle duplicate content issues across the web.

That solution, to date, remains effective.

But, as effective and easy-to-implement as it is, Google says;

“Use the rel=canonical incorrectly, and chances are Googlebot will just ignore your signals.”

Now, the question is, how do you handle duplicate content correctly, and when should you use rel=canonical?

The answers…

Duplicate content and canonicalization go hand in hand. If you’re having duplicate issues, you’re in the right place to solve that.

In this guide, I’ll show you exactly everything you need to know, including – the proper usage and implementation of rel=canonical tag – to take care of duplicate content on your website.

Let’s get started…

What is a Canonical tag?

A canonical tag (rel= “canonical”) is a piece of HTML code that tells search engines which specific page’s URL represents the master copy in duplicate, near-duplicate or similar content. 

This attribute tag specifies the master copy as the main version and tells search engines to index and rank it while consolidating the ranking signals to the URL specified as the canonical.

self-referencing canonical tag example - semoladigital.com
source: Ahrefs

Canonicalization It’s an easy way to tell search engines that you’re aware of duplicates and prevent them from appearing in SERP. By presenting only one version of similar content, you’re informing Google not to choose any version it feels like and index randomly. 

How rel=canonical fits into a typical URL in HTML

Canonical tags are placed within the HTML <head> section of a web page.

For example, 

If you have two web pages with the same or similar content – with URLs:

  1. https://yourdomain.com/sample-page
  2. https://yourdomain.com/main/sample-page

And you want to specify that Google should only index and rank the second one.

Then, you include a canonical attribute tag in the <head> section of the first web page.

So that it looks like this;

<link rel=“canonical” href=“https://yourdomain.com/main/sample-page/” />

How rel=canonical fits into a typical URL

Let me explain this…

Link rel= “canonical”: This tells the search engines that the link included in the tag should be followed and indexed and not the current page, which contains the snippet (In this case, page 1).

href= “https://example.com/sample-page/”: This refers to the main version. Informing search engines that the URL that follows the rel=canonical (page 2) is the main version; and one that should be indexed. 

This code snippet will be placed automatically in the head section of your web page if you’re using CMS such as WordPress. Otherwise, you’ll have to edit it in your page source code manually.

However, in the case of self-referencing, it will still point to the same URL:

SEO-Friendly Pagination: A Complete Best Practices Guide

Page 1: <link rel=“canonical” href=“https://yourdomain.com/sample-page/” />

Page 2: link rel=“canonical” href=“https://yourdomain.com/main/sample-page/” />

This will become clearer as we move on…

What the Canonical Tag is Intended For

Search engines usually find it difficult to guess the web page version they should index and rank for a given query, which can later hurt a site’s SEO. The canonical tag solves these issues by:

1. Getting rid of duplicate content within your site

A canonical tag is a proactive solution to prevent duplicate content. 

For instance, 

If your homepage is indexed and accessible via yourdomain.com and www.yourdomain.com,

search engines will find it hard to guess the one that you want to index. Because the two URLs are different. 

You can tell search engines by adding a canonical tag to one URL to link to your preferred URL.

2. Funneling all your link juice to one place

More than one version of a URL can hurt your link building – splitting your link juice across the URLs.

When a 301 redirect is not an option, the canonical tag helps consolidate your ranking signals (PageRank) and direct all the link juice (link equity) from the duplicate URLs to the canonical (preferred) URL.

3. Dealing with the protocol issues

Canonical tag helps take care of any issues that may arise from using both secure HTTPS and HTTP versions.

How Does Content Duplicate Hurt my SEO?

Duplicate content is something you should avoid if possible. When you allow it to exist across your site, you;

  • Risk splitting your link equity values between the multiple versions of the page.
  • Waste crawl budget – search engines crawling multiple versions of the same content and leaving the important content that needs to be prioritized on your website. 

Here’s what Google says:

Overly complex URLs, especially those containing multiple parameters (which lead to duplicate content), can cause problems for crawlers by creating unnecessarily high numbers of URLs that point to identical or similar content on your site. As a result, Googlebot may consume much more bandwidth than necessary, or maybe unable to completely index all the content on your site.”

What Causes Duplicate Content?

You might wonder how you have duplicate content when you don’t publish the same type of content multiple times on your site.

The truth is, if you understand how search engines work, you’ll discover that spiders only follow and crawl URLs and not web pages.

When you have similar pages (containing the same or similar content) and multiple URLs to access the pages, you run into duplicate content issues. 

And you’ll need to implement canonical tag to avoid confusing search engines into returning- 

  • wrong version, 
  • dilutes ranking signals, 
  • wasting time crawling multiple URLs with the same type of information.

Here are the common causes of duplicate content;

1. Parameterized URLs

These are query strings or URL variables that follow a question mark. Usually in key-value pairs and separated by an equal sign. And an ampersand can be used to add multiple parameters.

The most used cases are in:

  • Product re-ordering
  • Filtering
  • Tracking – such as UTM parameters
  • Identifying – example, differentiating between a blue shirt and redshirt from a store
  • Paginating
  • Searching – e.g., site search.
  • Translating

Often, URL parameters make no significant change to the content of a page. A re-ordered version of the page is often not so different from the original. A page URL with tracking tags or a session ID is identical to the original.

These URL parameters often make no changes to the content on a page. Meaning, a page URL with UTM tracking parameters is identical to the original URL with no tracking parameters.

Let’s take a look at the eCommerce site.

Filtering: Assuming this is the URL to a product category of shoes:

https//www.myshoes.com/en-gb/wears/shoes.html

If you apply size filter to shoes, a parameter is added to the URL:

https//www.myshoes.com/en-gb/wears/shoes.html?size=medium

If you filter down again, this time for color and you choose white:

https//www.myshoes.com/en-gb/wears/shoes.html?size=medium&color=white

These URLs, even though different with the addition of parameters, the content remains the same. But, search engines see and treat them as separate pages.

Tracking: Adding tracking parameters to track campaigns. 

https//www.myshoes.com/en-gb/wears/shoes.html?utm_medium=social

Re-ordering

https//www.myshoes.com/en-gb/wears/shoes.html?sort=lowest-price&order=highest-rated 

2. Mobile vs. Desktop version – Having pages for different device types (e.g., example.com and m.example.com)

A single page on your website can be accessed from two different URLs – mobile version (m.yourdomain.com) and desktop version (yourdomain.com) will appear to search engines as duplicate content unless you canonical one.

3. AMP and Non-AMP Version – example,

https://www.yourdomain.com/my-first-post and https://www.amp.example/my-first-post;

will be treated as duplicate content.  

4. www and non-www version – Having the same content at non-www and www variants, e.g., http://yourdomain.com and http://www.yourdomain.com

Also, result in duplicate content issues

5. Trailing slash (/) – Serving the same content with and without a trailing slash can result in duplicate content.

Example;

https://yourdomain.com/my-first-post/

and 

http://www.yourdomain.com/my-first-post

6. HTTPS and non-HTTP variants – For example,

http://www.yourdomain.com and https://www.yourdomain.com

When served are seen by Google as different URLs but the same content.

7. Serving similar content with and without capital letters – For example,

https://yourdomain.com/my-first-post.

and 

http://www.yourdomain.com/My-First-Post

Again, these are seen by Google as different URLs but the same content.

8. Printable versions of pages – can also lead to duplicate content and the non-printable (web pages) variant.

e.g., https://yourdomain.com/my-first-post

and 

http://www.yourdomain.com/print/my-first-post

9. Referring to the same post under different categories – e.g., https://yourdomain.com/articles/my-first-post

and 

http://www.yourdomain.com/guide/my-first-post

As we can see, the post “my-first-post” appears in two different categories,” articles” and “guide,” and it can result to duplicate content.

#7 Guidelines and Best Practices on the use of Canonical Tags

Implementing rel=canonical in web pages is not rocket science. Once you understand where and when you should place it, you save yourself from running into duplicate content.

In a moment, I’ll discuss with you five ways you can implement canonical tags. But before we get to that, there are principles to follow, which will guide whatever implementation method you use.

#1. Always place the rel=canonical in the <head> section of your HTML – Not just anywhere within the <head> tag but as early as possible to avoid HTML parsing issues

Complete Guide to Rel Canonical - How To and Why (Not) - Moz

Google disregards rel=”canonical” designated in the <body> tag

#2. Use only absolute URL – Like other HTML tags, the <link> tag also accept both the relative and absolute URL

While a relative URL specifies a “relative path” to the current page, absolute URLs specify the “full path.”

<link rel=canonical href=“yourdomain.com/sample-page.html” /> 

Is a relative URL since there’s no “HTTP://” and implies that the desired canonical URL is http://yourdomain.com/yourdomain.com/sample-page.html 

Google said in its official webmasters’ blog

“Though that is almost certainly not what was intended. In these cases, our algorithms may ignore the specified rel=canonical. Ultimately this means that whatever you had hoped to accomplish with this rel=canonical will not come to fruition”.

So, the best practice is to use:

<link rel=“canonical” href=“https://yourdomain.com/sample-page/” />

As opposed to this one:

<link rel=“canonical” href=”/sample-page/” />

Or,

link rel=“canonical” href=“yourdomain.com/sample-page/” />

#3. Use the correct HTTP/HTTPS and www/non-www variant – Remember, Google treats the HTTP/HTTPS and www/non-www version of a URL as a separate page. And so, be consistent when implementing rel=canonical.

If you’re on a secure (SSL certified) domain, use HTTPS together with the absolute URL.

#4. Use only one canonical tag per page

Multiple declarations of rel=canonical tag on a page (irrespective of the position in the HTML) will make Google ignore the hint. And whatever benefit of using rel=canonical will be lost.

#5Link to the canonical URL rather than a duplicate URL

All links within your site should point to the canonical to help Google understand your preference. Also, your inbound links should focus on the canonical URL to boost its ranking.

#6Always specify a canonical URL when using hreflang tags – Hreflang is an HTML <link> tag attribute used to tell search engines the relationship between pages in different languages. When using hreflang, Google stated that you should specify a canonical URL in the same language… 

“specify a canonical page in the same language, or the best possible substitute language if a canonical doesn’t exist for the same language.”

#7. Use self-referential canonical tags

Self-referential canonical is a rel=canonical link on a page that points to itself.

For example, If a page URL is:

https://yourdomain.com/my-first-post

then, a self-referential canonical tag will be;

<link rel=“canonical” href=“https://yourdomain.com/my-first-post” />

However, this is not mandatory for web admins but a recommendation. 

Here’s is what Google’s John Mueller has to say

“I recommend [using a] self-referential canonical because it makes it clear to us which page you want to have indexed, or what the URL should be when it is indexed.

Even if you have one page, sometimes different URL variations can pull that page up. For example, with parameters in the end, perhaps with upper lower case or www and non-www. All of these things can be kind of cleaned up with a rel=canonical tag”.

Modern CMS’ add rel=canonical tag to pages automatically so that you don’t have to worry about Self-referencing. But you’ll have to hardcode this if you’re using a custom CMS.

Google Webmaster Trends Analyst John Mueller covers the signals used to determine canonical URLs in this #AskGoogleWebmasters video:

Five different methods you can use to implement canonical tags for your website

Choose one of the following methods to specify a canonical URL for duplicate URLs or duplicate/similar pages.

Be sure to follow the general guidelines above for all methods.

You can use five recommended ways to specify a canonical URL for duplicate URLs or similar pages:

  1. HTML tag
  2. HTTP header
  3. Sitemap
  4. 301 redirect
  5. Internal links

Whichever method you choose, make sure you follow the guidelines discussed above.

1. Setting rel=canonical in the <head> tag of an HTML page.

This method is what I have been using in my examples since the beginning of this piece. 

It is the easiest method anyone can use to indicate when there’s a duplicate and inform search engines of the canonical URL you would like them to index and rank. 

It works by simply adding the following code into the <head> section of your HTML

<link rel=“canonical” href=“https://yourdomain.com/canonical-page/” />

For example;

You have a page on your site with this URL:

https://yourdomain.com/clothing/shirts-for-men

But this page can also be accessed by 2 or 3 other URLs – and when Google crawls the URLs, it will find the same content on it.

Now, suppose you want 

https://yourdomain.com/clothing/shirts-for-men

to be the canonical URL, simply add rel= “canonical” link element to the duplicate pages and point them to the canonical page, like this:

<link rel=“canonical” href=“https://yourdomain.com/clothing/shirts-for-men” />

If there’s is a mobile variant of the canonical page, then add a rel=”alternate” link to it, pointing it to the mobile version of the page, like this:

<link rel=”alternate” media=”only screen and (max-width: 640px)” href=”https://yourdomain.com/clothing/shirts-for-men” />

Most modern CMS like WordPress, Shopify allows you to specify this out of the box, so you don’t have to worry about messing with the source code.

Note;

This method only works for HTML pages. Files such as PDF will have to be implemented using an HTTP header. And that takes us to method 2.

2. Rel=canonical HTTP Header Method

Suppose you create a PDF version of your blog posts, but you want the blog post (HTML file) as the preferred URL to index, but because the PDF file is a non-HTML file, there are no ways to add a canonical tag except in the HTTP header.

HTTP/1.1 200 OK

Content-Type: application/pdf

Link: <https://yourdomain.com/blog/how-to-make-money-online/>; rel=”canonical”

If you have multiple URLs to the same PDF file, you can use a rel=canonical HTTP header to hint Google to only index and consolidate ranking signals to the canonical version.

HTTP/1.1 200 OK

Content-Type: application/pdf

<http://www.yourdomain.com/downloads/white-paper.pdf>; rel=”canonical”

Use absolute paths rather than relative paths with the rel= “canonical” link element. That is:

Use this structure: Remember to use an absolute path rather than a relative path. E.g.,

http://www.yourdomain.com/downloads/white-paper.pdf and not;

/downloads/white-paper.pdf

This method is simple once you know how to configure your server. HTTP header also exists in HTML files, and you can always choose it to place canonical tag attributes.

3. Setting canonical in XML Sitemaps

Google states that if you’re using a sitemap, pick the canonical URLs only and submit them. Specifying only the canonical URLs and excluding the non-canonical pages is a best practice as all pages in a sitemap are considered canonicals.

However, Google went on by saying…

“We don’t guarantee that we’ll consider the sitemap URLs to be canonical, but it is a simple way of defining canonicals for a large site, and sitemaps are a useful way to tell Google which pages you consider most important on your site.”

4. Setting canonicals with 301 redirects

Use 301 redirects when you want to send traffic from other duplicate pages directly to the canonical (your preferred URLs – that you want to be indexed).

For example;

Suppose you have three different URLs with which you can access your homepage:

https://yourdomain.com/home

https://yourdomain.com/index.php and

https://www.yourdomain.com

You can specify the preferred URL to pick one of these, and 301 redirect the other two URLs to the canonical URL. This way, when search engines and users visit the duplicate URLs, the 301 redirect command will simply redirect them back to your preferred (canonical) URL.

You can also use this method to prevent any duplicates that may result from HTTP/HTTPS and www/non-www versions of your site.

The 301 status code instructs search engines that a particular page has been permanently moved and can now be accessed at a new location.

To ensure that this method works perfectly for both users and search engines, use a server-side redirect method.

For more on 301 redirects, read my full guide. 

5. Internal link to canonical URLs

How you link from one page to the other across your site is a signal to Google. And the more consistent you’re, the easier it’ll be for Google to determine your preferred (canonical) URL.

For instance, if Google sees that you consistently link to a www version of your website and the non-www version has no link to it, Google will probably pick the www version because it has more authority than the non-www. 

However, to be on the safer side, you can edit your .htaccess file in the root folder to specify your preferred version between the www and non-www versions. 

RewriteEngine On

RewriteCond %{HTTP_HOST} ^yourdomain.com\.com$ [NC]

RewriteRule ^(.*)$ http://www.yourdomain.com/$1 [R=301,L]

5 Common mistakes to avoid when handling URLs canonicalization

Implementing rel=canonical can sometimes be complex because errors are not very obvious, and it is easy to go in the wrong direction.

In this section, I’ve put together the five most common mistakes I want you to avoid right away.

Mistake #1: Using Robot. tx to block canonicalized URLs

A robot.txt file instructs search engines on what to crawl and what not to crawl.

Using it to block the non-canonical URL will prevent crawlers from crawling and seeing any canonical tag you’ve implemented on the page to the preferred (canonical) URL. And thus, any link equity on the pages will not be transferred to the canonical page.

Here’s what John Mueller said about Robot.txt…

“A robots.txt disallow even trickier, we don’t even know if the page matches anything else on your site, so we couldn’t even use it for canonicalization if we wanted to.”

This is not a good way to handle duplicates and, as such, should be avoided.

Mistake #2: “no-index” the canonicalized URLs

When you add rel=canonical tag and a noindex tag together on a page, you’re confusing the search engines, and the best bet is that Google will ignore the noindex tag and prioritize the rel=canonical.

Google’s John Mueller stated in Reddit Q&A.

“This is also where the guide that you shouldn’t mix noindex & rel=canonical comes from: they’re very contradictory pieces of information for us. We’ll generally pick the rel=canonical and use that over the noindex. Still, any time you rely on interpretation by a computer script, you reduce the weight of your input 🙂 (and SEO is to a large part all about telling computer scripts your preferences)”.

Rather than using this method, use a 301 redirect Instead. Or stick only to the rel= “canonical.”

Mistake #3: Setting a 40X HTTP status code for the canonicalized URL

Suppose any of the canonicalized URLs returns anything other than 200 status code and, at the same time, no 301 redirect in the HTTP header. In that case, chances are Google will not see the canonical attribute tag and will not transfer the link equity to the preferred (canonical) URL.

Mistake #4: Multiple declarations of rel=canonical tags

Having multiple canonical tags set on a single page will make Google ignore them all. 

Multiple declarations of canonical tags are not uncommon. Some SEO plugins, by default, insert rel=canonical tag unknown to the webmasters. 

Canonical Tags: Guide for Beginners

Another case is when a web admin copies a page template which the author has specified a rel=canonical and unknown to the webmaster forget to change the target URL or remove the link entirely.

This can also occur when you indicate a rel=canonical in JavaScript and the HTML tag of a page.

In all of these, you’re sending a mixed-signal, and Google will ignore them.

To avoid this, double-check the page’s source code – especially the <head> section.

Mistake #5: rel=canonical all pages in pagination to the root page

Canonicalizing all other pages in the pagination series to the first page (root page) is a bad practice that can hurt your website. 

Instead of doing this, the best practice is to use a self-referencing canonical tag on all pages in a series.

For more on pagination, read my pagination – the dos and don’ts.

Update: Google will usually crawl and index your canonical URLs but may sometimes crawl the canonicalized version.

For example, a situation where this can occur is if m.example.com is your preferred URL to a given page and a user is searching from a desktop, if that page matches the user’s query, Google may decide to show the desktop version (example.com) – even though, you’ve hinted Google only to show the mobile version.

This is one of the reasons why you shouldn’t block or noindex your Canonicalized pages.

Leave a Reply

Your email address will not be published.

How Can We Help You?

Need to bounce off ideas for an upcoming project or digital campaign? Looking to transform your business with the implementation of full potential digital marketing?

For any career inquiries, please visit our careers page here.