Search engine optimization, in its most elementary sense, depends on one factor above all else: search engine spiders crawling and indexing your web site.
But virtually all web sites could have pages that you don’t need to embody on this exploration.
For instance, do you really need your privateness insurance policies or inner search pages to seem in Google outcomes?
At finest, these aren’t doing something to actively drive site visitors to your web site, and at worst, they could divert site visitors away from extra vital pages.
Fortunately, Google permits site owners to inform search engine bots what pages and content material to crawl and what to ignore. There are a number of methods to do that, the commonest being utilizing a robots.txt file or the robots meta tag.
We have a wonderful and detailed rationalization of the ins and outs of robots.txt, which you ought to undoubtedly learn.
But normally phrases, it’s a easy textual content file that resides in the root of your web site and follows the Robots Exclusion Protocol (REP).
Robots.txt offers crawlers directions about the web site as an entire, whereas robots meta tags embody directions for particular pages.
Some meta robots tags you would possibly use embody indexwhich instructs engines like google to add the web page to their index; noindexwhich tells it not to add a web page to the index or embody it in search outcomes; Followinstructing a search engine to comply with hyperlinks on a web page; don’t comply withtelling him not to comply with hyperlinks, and an entire host of others.
Both robots.txt and meta robots tags are helpful instruments to maintain in your toolbox, however there may be additionally one other method to practice search engine bots to noindex or nofollow: the X-Robots-Tag.
What is the X-Robots-Tag?
The X-Robots-Tag is one other method to management how your internet pages are crawled and listed by spiders. As a part of the HTTP header response to a URL, it checks indexing for a complete web page, in addition to particular components on that web page.
And whereas utilizing meta robotic tags is sort of easy, the X-Robots-Tag is a bit more difficult.
But that, in fact, begs the query:
When ought to you use the X-Robots-Tag?
According to Google, “Any directive that can be used in a robots meta tag can also be specified as an X-Robots-Tag.”
While it’s doable to set robots.txt directives in the headers of an HTTP response with each the robots meta tag and the X-Robots tag, there are some conditions the place you would need to use the X-Robots tag, the two commonest being when :
- You need to management how your non-HTML information are crawled and listed.
- You need to serve directives web site-extensive as a substitute of web page-degree.
For instance, if you need to forestall a selected picture or video from being scanned, the HTTP response methodology makes it simpler.
The X-Robots-Tag header can be helpful as a result of it permits you to mix a number of tags inside an HTTP response or use a comma-separated listing of directives to specify directives.
Maybe you don’t desire a sure web page to be cached and you need it to be unavailable after a sure date. You can use a mixture of “noarchive” and “unavailable_after” tags to instruct search engine bots to comply with these directions.
In essence, the energy of the X-Robots-Tag is that it’s far more versatile than the robots meta tag.
The benefit of utilizing a
X-Robots-Tag with HTTP responses is that it permits you to use common expressions to scan directives on non-HTML, in addition to apply parameters on a bigger international degree.
To assist you perceive the distinction between these directives, it’s useful to categorize them by kind. That is, are they crawler directives or indexer directives?
Here’s a helpful cheat sheet to clarify:
|Crawler directives||Indexer directives|
|Robots.txt – makes use of the person agent, permit, nonallow and sitemap directives to specify the place search engine bots on the web site can crawl and are usually not allowed to crawl.||Meta Robot label – permits you to specify and stop engines like google from exhibiting sure pages of a web site in search outcomes.
Don’t comply with – permits you to specify hyperlinks that ought to not convey authority or PageRank.
Tag X-Robots – Controls how specified file varieties are listed.
Where do you put the X-Robots-Tag?
Let’s say you need to block particular file varieties. An ideally suited strategy can be to add the X-Robots-Tag to an Apache configuration or .htaccess file.
The X-Robots-Tag might be added to a web site’s HTTP responses in an Apache server configuration by way of the .htaccess file.
Real examples and makes use of of the X-Robots-Tag
So that sounds nice in concept, however what’s it like in the actual world? Let’s have a look.
Let’s say you don’t need engines like google to index .pdf file varieties. This configuration on the Apache servers would appear to be this:
<Files ~ ".pdf$"> Header set X-Robots-Tag "noindex, nofollow" </Files>
In Nginx, it will appear to be this:
location ~* .pdf$ add_header X-Robots-Tag "noindex, nofollow";
Now, let’s have a look at a special situation. Suppose we wish to use the X-Robots-Tag to block indexing of picture information, reminiscent of .jpg, .gif, .png, and so forth. You can do that with an X-Robots-Tag that might appear to be this:
<Files ~ ".(png|jpe?g|gif)$"> Header set X-Robots-Tag "noindex" </Files>
Please be aware that understanding how these directives work and the way they impression one another is essential.
For instance, what if each the X-Robots-Tag and a robots meta tag are discovered when bot crawlers uncover a URL?
If that URL is blocked by robots.txt, some indexing and serving directives can’t be found and won’t be adopted.
If the directives should be adopted, it isn’t doable to forestall crawling of URLs that include them.
Check for an X-Robots tag
There are a number of strategies that can be utilized to verify for an X-Robots-Tag on the web site.
The simplest way to verify is to set up a browser extension that can inform you X-Robots-Tag info about the URL.
Another plugin you can use to decide if an X-Robots-Tag is getting used, for instance, is the Web Developer plugin.
By clicking on the plugin in your browser and navigating to “View Response Headers”, you can see the varied HTTP headers used.
Another methodology that can be utilized for scaling so as to discover issues on web sites with 1,000,000 pages is Screaming Frog.
After operating a web site by way of Screaming Frog, you can entry the “X-Robots-Tag” column.
This will present you which sections of the web site are utilizing the tag, together with which particular directives.
Using X-Robots tags in your web site
Understanding and controlling how engines like google work together together with your web site is the cornerstone of search engine marketing. And the X-Robots-Tag is a robust instrument you can use to just do that.
Just remember: it’s not with out its risks. It’s very straightforward to make a mistake and de-index your total web site.
That stated, if you’re studying this piece, you’re in all probability not an web optimization beginner. As lengthy as you use it properly, take your time, and verify your work, you’ll discover the X-Robots-Tag to be a helpful addition to your arsenal.
Featured picture: Song_about_summer/Shutterstock