There's nothing so basic to SEO than getting Google to index your content.
...and JavaScript SEO is still about that one basic. Don't be scared - we're not learning JavaScript as a language - no coding! But rather to make your JavaScript content:
- Crawlable
- Indexable and
- Google-friendly
JavaScript is a ubiquitous and powerful part of web applications that don't seem to be leaving the webspace anytime soon.
Its robust nature provides many features that have turned the web into a powerful machine; providing and adding:
- Intuitive features
- Interactivity
- Better user experience.
And lots more…
As AJAX-based applications replace the popular static HTML pages, users can now enjoy a better and richer experience faster than they used to.
But this has come at a huge cost for the web as crawlers are unable to access and see content that's dynamically created.
To make your JavaScript-powered websites Google-friendly - and to index your content, there are certain things you must know about JavaScript SEO.
And in this guide, you'll learn how Google search processes JavaScript on your website including other web applications, and Google search best practices for JavaScript SEO. And how to handle common issues that you may encounter on your website.
What JavaScript SEO is
JavaScript SEO is part of technical SEO that focuses on making JavaScript web applications Google-friendly.
Many websites are developed with JavaScript frameworks such as ReactJs, VueJs, Angular (Google's own MVW framework) etc to add functionalities.
As more and more JavaScript is added to websites, the page load increases, which sometimes impacts the overall site performance.
And unlike HTML and CSS, Js can be difficult to parse and not all search engines can render JavaScript content easily as Google does.
Processing AJAX applications has been a difficult thing for most search engines.
Because while browsers can dynamically produce your content easily, it remains invisible to crawlers.
However, Google has a way of crawling and indexing your content:
"If you're running an AJAX application with content that you'd like to appear in search results, we have a new process that, when implemented, can help Google (and potentially other search engines) crawl and index your content."
There are methods available that you can use to deal with JavaScript applications or websites.
But you need to regularly carry out manual maintenance to keep your content up-to-date.
And you'll see this shortly in the troubleshooting section of this article.
How GoogleBot Processes JavaScript Websites
Google has a rendering system (Web Rendering Service - WRS), which is based on an evergreen version of chromium.
There are many use cases of JavaScript on the web.
Websites use JavaScript to add functionalities - such as to render dynamic content.
This means the content you want search engines to the index would require the JavaScript on the page to render to the DOM first.
And this process is not as easy as with HTML-based websites.
So, you need the knowledge of JavaScript SEO to make your JavaScript website indexable.
The three main phases of JavaScript web application are:
- Crawling
- Rendering
- Indexing
1. Crawling Phase
Many sites rely on JavaScript to dynamically manipulate content on a web page.
This content doesn't appear within the initial HTML file until the JavaScript loads… and then populate the web page.
This means that until Googlebot executes the JavaScript, it'll not be able to see the content.
Our browsers can execute JavaScript and provide content on the fly - but the crawlers can't.
And so, to make the crawler see the same content a user will see, the server needs to provide the crawler with an HTML snapshot, which is the result of executing the JavaScript on your page.
Before crawling begins, Googlebot will first check whether you have allowed it to crawl through the web pages by reading the Robots.txt file.
And then proceed (if allowed) to make a GET request to the URLs waiting in the crawl queue.
But if you've disallowed it from crawling, the crawler will skip the URL(s) and not make any HTTP request to it.
Now, the crawler parses the response from the request made for other URLs and all links wrapped within href attribute of HTML are added to the crawl queue.
I'll explain this under the processing.
Pro tip: Google uses either mobile or desktop crawlers to crawl web pages. Each crawler is a user agent to your site with that device type.
Google's Crawlers
If you want to know the type of crawler to your site, you can use the URL inspection tool in the search console.
Here's an example, notice the “Crawl as”...
This also tells you whether you're on mobile-first indexing or Desktop indexing.
If your site is new, you'll probably be on the primary crawler which is the mobile crawler.
But sometimes, Google may send a secondary crawler (desktop) to your website.
There are two main problems you may encounter that can impact your SEO:
1. blocking a specific Country or using a particular IP in different ways- The majority of the requests Google makes come from Mountain View, CA but sometimes make requests from other locations outside the USA.
The problem arises here if you block a specific country or use a particular IP in different ways, and because of that Googlebot may not be able to see your content
2. Using user-agent detection to show content to a specific crawler - this usually results in Google seeing content differently from the users.
And there are tools like URL inspection, Mobile-friendly Test and Rich Results Test you can use to troubleshoot all issues of JavaScript SEO.
These tools tell you if Google can see the content on your page - maybe you're blocking the bots.
If you notice that spammers are accessing your site in the name of Googlebot, you can verify to be sure of the user-agent by running a DNS lookup.
Note: Google doesn't post a public list of IP addresses that you can easily whitelist. And often change their IP address.
So you may want to verify by following these three steps:
- Go to your logs and run a DNS lookup on the IP address by using the host command.
- Check to see that the domain name is either googlebot.com or google.com.
- Run a forward DNS lookup on the domain name retrieved in step 1 using the host command on the retrieved domain name. Verify that it is the same as the original accessing IP address from your logs.
For more details on this, kindly follow this URL https://support.google.com/webmasters/answer/80553
When crawling ends, the HTML - together with the Js file, CSS file and XHR requests - of the page are stored and ready for processing.
There are a lot of systems obfuscated by the term “Processing” in the image. I’m going to cover a few of these that are relevant to JavaScript
2. The Processing phase
A lot of things are involved in the processing stage but I've simplified it below:
Link processing
If you allow Googlebot to crawl your webpage and follow the URLs (the links), it discovers the links in the HTML page that links to other pages (external or internal).
And then, add all the links to the crawl queue, which are then used to schedule crawling.
This is where you can control and instruct Google to either dofollow or nofollow a certain link.
"For certain links on your site, you might want to tell Google your relationship with the linked page. In order to do that, you can use one of the rel attribute values in the <a> tag.
For regular links that you expect Google to follow without any qualifications, you don't need to add a rel attribute."
If you don't want Google to follow a link on your page, simply add a nofollow tag and Google will skip it - and not add it to the crawl queue. Meaning... you're cutting the crawl path.
Note: When it comes to internal linking with JavaScript (non-HTML attributes), until the script is rendered, the links to other pages won't show.
Although, this is relatively done fast and shouldn't be a problem.
But it calls for concern when too many unnecessary scripts are loaded before it gets rendered.
Resources processing
Along with links, Google also pulls out the resources required for the page such as CSS and JS files from the <link> tags.
Note that, the external and internal links are pulled from <a> tag with href attribute, which is why you must specify the correct attributes for your links.
You can deal with non-search-friendly JavaScript by using something like this:
<a href=”/page” onclick=”goTo(‘page’)”> still okay
Or simply using;
<a href=”/page”>simple is good</a>
Resource Caching
Every file that Google downloads, including HTML pages, JavaScript files, CSS files, etc., is going to be aggressively cached.
Google will ignore your cache timings and fetch a new copy when they want to.
I’ll talk a bit more about this and why it’s important in the Renderer.
When Google crawls web pages, it makes raw HTML copies of the files from your server.
And before pages are sent to the renderer, all the files involved in the build-up of the page are downloaded and cached.
This idea simply allows search engines to fetch pages for users quickly so that Google doesn't need to go all out to download pages for every page request to match users' queries.
The cached version of your web pages is indexed.
To make Google downloads and updates the updated version of your Web pages for rendering, use file versioning or content fingerprinting to generate new file names whenever there's a significant change.
Pro tips: by saying HTML it means all the resources including HTML codes, JS and CSS codes that make up the page.
Google caches them all by ignoring your cache timings and fetching new pages from your server whenever they like.
Processing Duplicates
Google's aim is to index and show pages with distinct information.
Before a page is rendered and sent to the index, Google tries to eliminate any duplicate content from the already downloaded HTML just after crawling.
However, some sites built with JavaScript frameworks and using the App Shell model can cause duplicate issues.
App shell model is used on sites with relatively unchanging navigation, but dynamic content.
And Google may not be able to see the actual content on the page until the JavaScript is rendered.
How this causes duplicate content
As Google is unable to see the actual content which of course should make each page distinct, the result may be some little content and piece of code which can even look similar to some extent on many pages across the site.
In such a case, the web pages appear as duplicate and may not immediately go to rendering.
Although this should resolve within a few seconds.
But to avoid this, a server-side or pre-rendering method is still a great way. Because it makes your website load faster for users and Googlebot to crawl.
Restrictive Directives
Another thing that happens during the processing stage is how Google responds to a given directive.
For instance, if there is a conflict between normal HTML and the rendered version of a page, Google will only obey whichever is restrictive.
No index will override index, and the noindex in HTML will skip rendering.
3. Rendering process
Now, when pages are waiting in the queue, Google uses a headless chromium-browser called Evergreen to render the pages and execute the JavaScript.
It'll then parse the rendered HTML for links again and queue up the URLs it finds for crawling.
This is the stage where Google will be able to see the content on the webpage.
And once the JavaScript is executed, any changes made by the JavaScript is processed into the Document Object Model (DOM).
Google uses the Web Rendering Service (WRS), which includes things like denying permissions, being stateless, flattening light DOM and shadow DOM, and more which you can access here.
The rendered HTML is also used to index the page.
Note: Google will add pages to the queue in both the crawling and rendering process and it can be difficult to know when and which page is waiting for crawling or rendering.
How Google Sees Content In The Rendered DOM
Does Googlebot See Content The Same Way Users Do?
The Document Object Model (DOM) is an API for HTML and XML documents.
It defines the logic behind documents structure and the way a document can be accessed and manipulated.
When you browse through a web page you don't get to see what has happened in the background.
Your Browsers can easily execute JavaScript on a page and create content on the fly, but the crawlers are unable to.
So as long as your content is loaded in the DOM, Google will see it.
But if the content can't be loaded into the DOM, your content will not be seen.
You might be wondering how Googlebot see the entire content on a long page...
Well, unlike humans, Googlebot doesn't need to scroll through a long piece of content to know what's on it. They have their own way of accessing every content on a page.
Google usually resizes the loaded page into a longer size and doesn't need to scroll.
For example, a mobile device with a screen size of 411x731 is resized to 411x12140 pixels.
For desktop also the length is increased to full length.
E.g., 1024x768 pixels is resized to 1024x9307 pixels.
Also, when Google renders the page they don't paint the pixels.
All they need is to know the structure and the layout, and they know this without actually painting the pixels.
Here's what Martin Splitt from Google said:
"In Google search, we don’t really care about the pixels because we don’t really want to show it to someone. We want to process the information and the semantic information so we need something in the intermediate state. We don’t have to actually paint the pixels."
Testing And troubleshooting For JavaScript SEO
1. Pages Cache
When Google crawls the web, it takes a "snapshot" of the webpage as a backup.
The cached pages are extremely useful especially when the site is down or timed out.
But, what Google cached is not always a reliable way to know what Googlebot sees and it's not that useful when it comes to debugging JavaScript sites.
What Googlebot sees is the initial HTML - and sometimes, the rendered HTML.
2. View Source vs. Inspect Element
If you always think these two are the same, now you know there is a difference.
When you right-click to view-source, it'll show you the same thing a GET request would, which is the raw HTML of the page.
But inspect would show you the content after the DOM has been processed and changed.
When working with JavaScript, you should always choose to use the inspect element over the view-source, because it's closer to what Googlebot sees.
The page source will not include all the content you will want the crawler to see.
In other words "View Page Source" is exactly what the crawler gets.
3. Google Search
Copy out some text from your content and search for it on Google. If the page that contains the content is returned on the SERP, that means the content was seen. If not, check to see if the content it's not hidden by default because any hidden content will not show up on SERP.
4. Google Testing Tools
Google has some testing tools which you can use to debug JavaScript on your site.
Tools like Mobile-Friendly Test Tool, Rich Results and URLs Inspection Tool in Google Search Console are useful tools - you can use to see DOM loaded content, resources that are blocked and including error messages, which can be helpful while debugging.
However, while these tools will often show you HTML rendered in the DOM, you can always search on Google for a snippet of text to see if it was actually loaded by default.
5. Ahrefs Tools
Where most SEO tools fail is the ability to render web pages at scale - with all the JavaScript. But Ahrefs gives you that advantage. The tool allows you to check for JavaScript redirects and also reveals internal links within JavaScript.
To allow JavaScript data in your site audit, simply enable JavaScript in the crawl settings(see image above). And The Ahrefs Toolbar allows you to compare HTML to rendered versions of tags.
Rendering Option For JavaScript Websites
Rendering takes the content on your web pages and displays them to the user. There are two main types of rendering:
- Server-side rendering
- Client-side rendering
But there are other rendering options in between you can choose from to make your website search-friendly.
Google has a reference chart for rendering JavaScript (image below)
Any type of server-side rendering (SSR) or pre-rendering configuration is good for handling JavaScript SEO as the rendering process happens before it gets to the browser.
Client-side rendering is not a problem for Google but you must also consider other search engines.
For instance, Bing supports JavaScript rendering but not pretty well as Google. And in recent updates, Bing advises on how to make your website JavaScript SEO-friendly:
"Limit usage of dynamic loading of resources – i.e. AJAX to limit the number HTTP requests and limit the usage of JavaScript on large web sites."
But, what about other search engines with little to no support for JavaScript?
When you think of making your JavaScript site SEO-friendly - not just Google-friendly, consider server-side rendering or other pre-rendering options.
How To Make Your JavaScript Websites SEO-Friendly (SEO for Client-side Rendering)
Let's walk through how you can optimize your JavaScript site and make it search-friendly.
There are some slight differences from the "normal" SEO which I'm going to show you.
Allow Crawlers
Allow Googlebot to access and download resources on your site so that they can properly render your content and index. Your robots.txt file should look this:
User-Agent: Googlebot
Allow: .js
Allow: .css
Use ‘History’ Mode Instead Of The Traditional ‘Hash (Fragments)’ Mode For URLs
To allow Googlebot to find links on your pages use the History API. Googlebot looks for links on your page and only considers the links in href of the HTML attribute.
Anything after #/ is ignored and Googlebot won't crawl it. So avoid the use of fragments to load different page content (most especially single-page applications) and only use the History API.
Use meaningful HTTP status codes
One thing Googlebot uses to detect if something went wrong during crawling is the HTTP status code.
Use HTTP status code to tell Googlebot which page it should crawl and index or which ones it shouldn't.
You can use 401 (Unauthorized status code) for pages behind a login for example, and you can tell Googlebot if a page has moved to a new URL so as to update the index when crawling again.
Most SEOs used the 301/302 redirect. But redirect works differently in JavaScript being a client-side. 301/302 are server-side redirects and so when adding redirects, the recommended function is window.location.replace().
An example of what a JavaScript redirect to my homepage looks like:
<html>
<head>
<script>
window.location.replace("https://www.semoladigital.com/");
</script>
</head>
</html>
When set up this way, it'd send visitors to https://www.semoladigital.com/ upon page load.
Although many would advise you to use…
Window.location.href.
But The problem with this implementation is that - the current URL is added to the visitor's navigation history. This can cause the visitor to get stuck in back-button loops.
Whereas, the window.location.replace() would not.
Avoid using the window.location.href when you want to redirect visitors immediately to another URL.
This redirect method works well for JavaScript, it's supported by Google and passes PageRank.
Webmaster Trends Analyst John Mueller responded in a #AskGoogleWebmasters to whether Googlebot can detect client-side JavaScript redirects. And Mueller said,
“We support JavaScript redirects of different types and follow them similar to how we’d follow server-side redirects"
Client-side JS Redirects: Can Googlebot Detect Them? #AskGoogleWebmasters
Use meta robots tags correctly
Sometimes you might have some pages (or content) that you want to prevent search engines from indexing.
For pages like thin content or upcoming promotions, you can use the meta robot tag to prevent Googlebot from indexing or following links on the page.
For example, the code below on a page will block Googlebot from indexing the page:
<!-- Googlebot won't index this page or follow links on this page -->
<meta name="robots" content="noindex, nofollow">
On-Page SEO
Do on-page optimization as you would for a non-javaScript site. Keep to the optimization guidelines for content, title, meta description, image alt attribute, and so on…
For more on on-page SEO optimization follow this link
Duplicate content
Handle duplicate content with canonical tags as you would for every other type of website.
Canonical tags allow you to choose only one version and hint search engines to index that version.
SEO “plugin” type options
In JavaScript apps and websites, plugins are referred to as modules. Modules are used by JavaScript frameworks and perform the same functions as the plugins you're used to in WordPress.
A React Helmet allows you to set the popular tags that you would need to do JavaScript SEO.
Use structured data
Implementing structured data on your JavaScript site is as easy as every other type of site. You can generate the required JSON-LD and then, inject it into the page using JavaScript. Just be sure that you carry out testings using Google structured data testing tool to prevent unnecessary issues.
Sitemap Generation
JavaScript frameworks each has their own module that can easily be used to generate a sitemap.
For example,
The following code will generate and save your site sitemap:
sitemap-builder.js
require('babel-register');
const router = require('./router').default;
const Sitemap = require('../').default;
(
new Sitemap(router)
.build('http://my-site.ru')
.save('./sitemap.xml')
);
Whatever framework you use, typing the framework +sitemap on Google will return a link to where and how you can implement the sitemap on your site.
Error Pages
JavaScript sites can't throw a 404 error because it is client-side based and not server-side. To handle error pages, use JavaScript redirects to pages with 404 status code.
You can also add a noindex tag to pages that are failing along with something like: "404 page Not Found" since a 404 Not Found page will return a status code of 200, which is what you want.
Lazy Loadings
Working with JavaScript SEO is not different from working with WordPress SEO. In the sense that, while you've plugins to do virtually all you want to achieve on your site, there are also modules to add different functionality to the JavaScript site. Lazy and Suspense are the two most popular modules to help solve this type of issue. A good practice to improve your site's overall SEO is to lazy load images so that they only load when users are about to view them.
Conclusion
JavaScript is a ubiquitous and powerful part of web applications that don't seem to be leaving the webspace anytime soon. Therefore, an understanding of how Googlebot and other crawlers read, crawl, and render JavaScript applications is essential to optimizing your Javascript web app to rank top of search engines.