Robots Meta Tag, X‑Robots-Tag & Data-nosnippet: Everything You Need to Know in 2022

Most SEOs don’t need to go beyond the noindex and nofollow directives, but it’s good to know that there are other options as well. 

Keep in mind that all directives listed here are supported by Google.

We all know the robots.txt file.

But, you can’t control how search engines index your content with a robots.txt. 

A robots.txt only manages the accessibility of your content to crawlers. It doesn’t go beyond that.

Whether the content should or shouldn’t be indexed – is what a Robots Meta tag or X-Robots Tag HTTP header takes care of.

Although, setting noindex in robots.txt file is not an uncommon practice but it’s something Google never supported and was finally deprecated in July 2019.

  • So, how do you set directives for crawlers at the page level; not to index a particular page or document?
  • And even at the textural level; where you don’t want a particular text to appear in SERP?

This guide introduces you to the uses of Robots Meta tag, data-nosnippet HTML attribute and X-Robots HTTP response header tag.

You’ll learn what they are and when including how to use them – to avoid some common mistakes which can hurt your SEO.

Let’s get started…

Table of Contents

What is a Robots Meta Tag?

A robots meta tag is an HTML snippet that goes into the head section of a page and tells search engines how the page should be indexed and served to users in SERP.

Unlike the robots.txt file, which is a site-wide approach, the robots meta tag allows you to use a page-specific approach…

To controlling how each page on your website should be indexed and served to users on the search engine results page. 

Here’s how a Meta robots tag look like:

<meta name=”robots” content=”noindex” />

And… here’s how it looks in the HTML code

<!DOCTYPE html>

<html>

<head>

<meta name=”robots” content=”noindex” />

(…)

</head>

<body>(…)</body>

</html>

Meta Robots values and attributes

In the snippet above…

The value for the name attribute “robots” specifies that the directive applies to all search engines’ crawlers. 

This value is also called user-agent (UA), which a crawler must be identified with. 

And the “content” attribute value (noindex) specifies that the page should not be indexed and thus should not appear on SERP.

If no value is set, search engines will request and index the page.

Note: Both the “name” and the “content” attributes are non-case sensitive.

Why is the robots meta tag important in SEO?

The Meta robots tag together with X-Robots-Tag (described below), are used to control how search engines index and serve snippets of your content.

Primarily, their main function is to prevent pages from showing up in search results. 

And there are times you’ll want to prevent certain pages on your site from appearing in SERP. 

Examples are:

  • Pages with thin content – adding no value to your visitors;
  • Thank you pages;
  • Landing pages;
  • Internal search results page;
  • Pages in the development stage;
  • Pages for an upcoming event (product launch, promotions or contest) – you don’t want the information to leak out even before your launching;
  • Duplicate content. 

The basic thing in SEO is getting search engines to index your pages. 

For smaller websites, managing crawlability and indexation shouldn’t be a problem. 

In fact, a robots.txt file may be all you’ll need to get your SEO running. 

But as your website grows, you may have to combine page-level directives (meta robots tag) with robots.txt (site-wide directive) and sitemaps.

Valid indexing & serving directives by Google

The following directives are supported by Google and you can use them to control how Google indexes and serves your content snippet with the robots meta tag and the X-Robots-Tag.

Note that: Each value represents a specific directive. And you can combine multiple directives in a comma-separated list. 

These directives are, however, case-insensitive.

How to Set Robots Meta Tags in HTML

all

This directive is the default value of “index” and “follow” and has no effect if explicitly listed. 

This means there are no restrictions for indexing or serving of content to users.

<meta name=”robots” content=”all” />

noindex

Tells Google not to index the page – and prevent it from showing in SERP.

<meta name=”robots” content=”noindex” />

nofollow

“nofollow” simply instructs robots not to follow and crawl all links on the page. 

Except there are other crawl paths to the nofollow-ed page, the page will not be indexed.

<meta name=”robots” content=”nofollow” />

none

Opposite of “all”. 

none” combines “noindex” and “nofollow” directives. Avoid using “none” because search engines like Bing don’t support it.

<meta name=”robots” content=”none” />

noarchive

Use “noarchive” when you want to block Google from showing a cached copy of your web page in the SERP.

<meta name=”robots” content=”noarchive” />

notranslate

Use this to block Google from offering a translation of the page in the SERP.

<meta name=”robots” content=”notranslate” />

noimageindex

Use this to stop Google from indexing the images embedded on your page.

<meta name=”robots” content=”noimageindex” />

unavailable_after:

This instructs Google not to show a page in search results after a specified date/time. 

Basically, unavailable_after implies a noindex directive with a timer. The date/time format must be specified in RFC 850 format, RFC 822, and ISO 8601. If no valid [date/time] is specified, the directive is ignored.

<meta name=”robots” content=”unavailable_after: Sunday, 01-Sep-19 12:34:56 GMT” />

nosnippet

Tells search engines not to show text snippets or video previews in the SERP.  

<meta name=”robots” content=”nosnippet” />

However, Google says

“A static image thumbnail (if available); may still show up on SERP, when it results in a better user experience – whether on web search, Google Images or Discover search results”

max-snippet:[number]

You can specify the maximum [number] of characters you want Google to display in SERP with the “max-snippet” value. 

Note that: this only applies to text snippets and not images and videos. 

And the directive is ignored by Google if you do not specify a parseable [number].

Special values are: 

0: Setting “0” will give no text snippet in search results. Meaning no snippet will be shown. It’s equivalent to nosnippet.

-1: No limit on text preview. But Google chooses what they think is most effective for users to discover your content.

Example:

<meta name=”robots” content=”max-snippet:80″>

This tag will set up the limit to 80 characters. 

max-image-preview:[setting]

Allows you to set a maximum size for an image preview on your page in search results.

Acceptable values are:

none: Don’t show images. 

standard: Set to default. image preview may be shown when relevant.

large: May show a larger image preview…up to the width of the viewport.

<meta name=”robots” content=”max-image-preview:standard”>

max-video-preview:[number]

Instructs Google to use a maximum of [number] in seconds as a video snippet for any videos on the page in search results.

Other supported values:

0: means none. But a static image may be used, in accordance to the max-image-preview setting.

-1: There is no limit.

Note: If no parseable [number] is specified, Google will likely ignore the directive.

Example:

<meta name=”robots” content=”max-video-preview:20″>

The tag would allow Google to show a maximum of 20 seconds for video on the page:

How to Combined Indexing and Serving Directives Together

You can create multiple directives by combining them with robots Meta tag using commas. 

Here’s an example of a robots meta tag instructing crawlers not to index the page and not to crawl any of the links on the page:

<meta name=”robots” content=”noindex, nofollow”>

Here’s another example that limits the text snippet of a given page to 80 characters, and at the same time allows a large image preview:

<meta name=”robots” content=”max-snippet:80, max-image-preview:large”>

What if I specify multiple crawlers (user-agents) along with different directives?

When you specify multiple crawlers along with different directives, what happens is… search engines will use the sum of the negative directives.

For example:

<meta name=”robots” content=”nofollow”>

<meta name=”googlebot” content=”noindex”>

In this situation, Googlebot will interpret the instruction on the page as – noindex, nofollow.

What is “data-nosnippet” HTML Attributes?

While the robots.txt file allows you to instruct search engines the way you want your site to be crawled, and the meta robots tag set the instructions at the page level, the data-nosnippet HTML attribute lets you instruct search engines not to show a particular text (<div>, <span> and <p> tag elements) in SERP.

How to use the “data-nosnippet” HTML Attributes

The data-nosnippet is a boolean attribute, valid with or without a value. 

However, to ensure machine-readability, the HTML must be valid and all appropriate tags – closed accordingly.

data-nosnippet in <span> tag within <p> element:

<p>This text can be shown in a snippet

 <span data-nosnippet>but this part would not be shown</span>.</p>

data-nosnippet in <div> tag element with and without boolean attribute:

<div data-nosnippet>not in snippet</div>

<div data-nosnippet=”true”>also not in snippet</div>

Common mistake with the use of data-nosnippet:

<div data-nosnippet>some text</html>

<!– unclosed “div” will include all content afterwards →

Invalid use case of data-nosnippet:

<mytag data-nosnippet>some text</mytag>

<!– NOT VALID: not a span, div, or section –>

How to Set Meta Robots Tags on Your WordPress Website

I know by now you’ll be asking; 

how can I implement this on my WordPress site?

So far we’ve been using examples which only illustrate how you can place the code in your page HTML. 

And while you can easily edit and place this code using any HTML editor such as the popular Notepad++ or sublime text editor, and then upload it to the server with an FTP… the implementation is a bit different on WordPress and other CMSs. 

But the truth is… 

Whether you’re using CMS or not, the meta robots tag goes straight into the head section and not other places in the HTML. 

And it’s even simpler and easier to do in WordPress; thanks to SEO plugins like Yoast and RankMath.

Setting Meta Robots Tag Using RankMath SEO

If you have the RankMath SEO plugin installed, go to the Title & Meta section to define a global meta tag for your WordPress posts and pages. You will still be able to control this at the page level.

Robots Meta Tag, X‑Robots-Tag & Data-nosnippet: Everything You Need to Know in 2022 – Semola Digital (semoladigital.com)

So, if you want to set it at the page level, go to the “Advanced Tab” which is at the side of each post and page. Set the meta robots tag to how you want search engines to index the page.

The following settings would implement “noindex, nofollow” directives.

Robots Meta Tag, X‑Robots-Tag & Data-nosnippet: Everything You Need to Know in 2022 – Semola Digital (semoladigital.com)

What is an X‑Robots-Tag?

X-Robots-Tag is an HTTP header tag sent from a web server, which controls the indexing of a page.

The X-Robots Tag only differs from the robots.txt file and meta robots tag in that it’s a part of the HTTP header. 

And Google has clearly stated that:

“Any directive that can be used in a robots meta tag can also be specified as an X-Robots-Tag.”

While you can set meta robots tags directives in the head section of an HTML page, its implementation does not apply to non-HTML files such as PDF files, image files and video files.

And X-Robots-tag is the only ideal approach you can use to prevent search engines from indexing these types of files in search results.

Here’s an example of an HTTP response header with an X-Robots-Tag telling crawlers not to index a page:

HTTP/1.1 200 OK

Date: Tue, 13 June 2012 20:04:50 GMT

(…)

X-Robots-Tag: noindex

(…)

Application and the Uses of X-Robots-Tag

An ideal way you can set X-Robots-Tag on your website HTTP response header is via an Apache server configuration, which is the .htaccess file or the httpd.conf file. 

Or the site’s .conf file on NGINX.

For example, 

On Apache…

If you want search engines not to index any .pdf file types on your website, the best approach is configuring your Apache server like this:

<FilesMatch “.pdf$”>

Header set X-Robots-Tag “noindex, noarchive, nosnippet” 

</FilesMatch>

And on NGINX… the settings would look like this:

location ~* \.pdf$ {

  add_header X-Robots-Tag “noindex, nofollow”;

}

In the same way, if you want to use the X-Robots-Tag to block search engines from indexing the image files, such as .jpg, .gif, .png, etc, on your site; it would look like this:

On Apache

<Files ~ “\.(png|jpeg|gif|)$”>

Header set X-Robots-Tag “noindex” 

</Files>

 On NGINX:

location ~* \.(png|jpe?g|gif)$ {

  add_header X-Robots-Tag “noindex”;

}

Note that: if the URL you apply meta robots tag or X-Robots-Tag to… is blocked by robots.txt file, the directives on the page will not be discovered – and thus won’t be followed.

There are two main benefits of using X-Robots-Tag

  1. With X-Robots-Tag, you can set site-wide directives for crawlers.
  2. It gives you a high level of flexibility with the support of regular expressions.

Combining Multiple X-Robots-Tag Directives

You can combine multiple X-Robots-Tag in the HTTP header, or even specify a comma-separated list of directives. 

Here’s an example of an HTTP header response combined.

HTTP/1.1 200 OK

Date: Tue, 13 Jun 2012 20:04:50 GMT

(…)

X-Robots-Tag: noarchive

X-Robots-Tag: unavailable_after: 25 Jun 2012 14:30:00 PST

(…)

Specifying user agents and using comma-separated Directives

For instance, the following set of X-Robots-Tag set instructions for two different search engines and also set more than one directive with a comma-separated value.

HTTP/1.1 200 OK

Date: Tue, 13 Jun 2012 20:04:50 GMT

(…)

X-Robots-Tag: googlebot: nofollow

X-Robots-Tag: bingbot: noindex, nofollow

(…)

Like the meta robots tag, any directive you specified without a user-agent (UA) will be valid for all crawlers. 

And if you mistakenly set conflicting robots directives, Google will use the most restrictive directive.

For instance;

If you set max-snippet:150 and nosnippet directives on the same page, Google will apply the nosnippet, which is more restrictive.

Note: The HTTP header and the user-agent name including the specified values, are not case sensitive.

How to Check for an X-Robots-Tag on Your Website

Let me show you two methods that you can easily use to check for an X-Robots-Tag on your website.

Method #1: Screaming Frog.

Download and install Screaming Frog, enter the domain you want to check and run an audit of the site.

After running the site, head over to the “Directives” tab and look for the “X-Robots-Tag” column. 

Here you’ll see URLs that are blocked by meta robots or X-Robots-Tag directives such as ‘noindex’ or ‘nofollow’.

Method #2: Web Developer Plugin

My second method is using plugins such as the Web Developer plugin. It’s a versatile tool, which allows you to check whether a site uses X-Robots-Tag or not.

To check this…

After installing the plugin on your browser, click on it and navigate to “View Response Headers“. 

This will show you various HTTP headers that are being used on the site.

Mistakes to Avoid When Setting up Robots.txt File, Meta Robots Tag and X-Robots-Tag

If you manage a huge website, handling and dealing with duplicate content issues and running a successful Technical SEO can be overwhelming. 

Because you don’t want your crawl budget to be a waste. 

You want search engines to focus on specific pages on your website and thus, you want to keep some pages out of the index.

And while doing this… there are some common mistakes people make, which hurt their SEO back.

So let’s take a look at some of these common mistakes in regard to robots directives:

Mistake #1: Adding noindex directives to pages already disallowed in the robots.txt file

Avoid disallowing the crawling of pages that you’re trying to get deindexed in robots.txt. 

If you want Google not to index a certain page on your site, you add a “noindex” meta robots tag. Setting robots.txt to disallow the same page will prevent Googlebot from recrawling the page. 

And… the result is Google will never discover the noindex directive.

Noindexed pages shouldn’t receive any organic traffic. And if it is thus, it means the page is still indexed. 

If you add a “noindex” tag recently, you may wait a little while for Google to re-crawl and de-index the page. 

But if not recently… and the page still receives organic traffic, chances are – you’ve blocked the crawl path with the robots.txt file.

If this is the case, go ahead and check for the issues and fix them appropriately.

Mistake #2: Poor Sitemaps Management

If you’re trying to de-index a page from search engines using a meta robots tag or x‑robots-tag, you should leave it in the sitemaps until it’s been successfully deindexed. 

A sitemap is one of the fastest ways you can instruct search engines to crawl and re-crawl your site after you’ve made some changes (simply by setting a lastmod date to the date you added the noindex tag). 

And so, If you remove the page from sitemaps, Google may be slow to recrawl it.

Pro tips: Do not keep indexed pages in your sitemap for long. Once you know and are sure that the page has been deindexed, remove it from your sitemap to keep it clean and as well save the crawl budget.

Mistake #3: Hiding URLs in robots.txt instead of noindexing them

Some developers often try to hide the URLs of pages about upcoming events such as promotions, products launches or discounts by disallowing access to them in the site’s robots.txt file. 

This is a bad practice because a robots.txt file is in public view of humans and as such, the hidden pages are easily leaked.

The proper way of handling this is adding noindex in Meta robots tag or X-Robots-Tag.

Preventing robots from crawling and indexing anything in the staging environment is a good practice. 

But, it sometimes gets forgotten and pushed into production, resulting in organic traffic plunging.

Mistake #4: Leaving noindex directives in the production environment

A checklist must be one of the developers’ tools as it’ll guide them to remember to remove any disallowed rule set in the robots.txt file and noindex directives in the meta robots tag before pushing work to production.

Summing it up…

Technical SEO can be quite complicated if not managed properly. 

While there are multiple ways to instruct search engines not to crawl and index certain sections of your website… 

Or resources on a page – including keeping some text out of search results snippets, It can lead to conflicts and sometimes results in wasting crawl budgets.

I hope with this guide you can avoid the common mistakes and now apply the – meta robots tag, data-nosnippet and X-Robots-Tag – and align it with your robots.txt file best practices for long-term solutions on your website.

Drop your comments below. If you have any questions or follow me on Twitter or LinkedIn and AMA.

Oladoyin Falana
Oladoyin Falana
https://semoladigital.com