Magento 2 Robots.txt File: Everything You Need to Know

Magento 2 Robots.txt File: Everything You Need to Know

As the title indicates, we are going to be discussing everything there is to know about the Magento 2 robots.txt file.

What is the Magento Robots.txt File?

Robots.txt is an all-important file from the search engine optimisation (SEO) perspective which contains instructions for the web crawlers i.e. search engines as to which pages to crawl and which not. This one tiny file can get your entire store or specific pages out of search index if configured incorrectly.

There will be sections or pages in your store which you won’t want to be indexed and robots.txt file is the place to communicate this to search engines. Almost every website needs this file and so do Magento 2 stores. Of course, telling which webpages to crawl is not the only purpose.

Why is the Robots.txt File Important?

There are several reasons why you need to know how to configure the robots.txt file in Magento 2. Let’s go through a few of them.

Protect Sensitive Information

If you wish to avoid showing sensitive webpages such as administrative directories on the search engines, you need to tell the search engine to stop indexing them. Similarly, you may want to restrict the login or authentication pages to only relevant users rather than have them visible to everyone.

Avoid Penalties Due to Duplicate Content

If you have duplicate content on your website for whatever reason, the last thing you want is the search engine crawling all those pages. If they do, they’ll penalise your website for duplicate. Your website will go down in search rankings or be deindexed altogether. So, the better option is to prevent the search engine from crawling duplicate content pages.

Hide Internal Search Results

There is no point in having internal search result pages indexed. They serve no purpose at all from a SEO perspective. Thus, you must prevent them from being indexed using the robots.txt file.

Hide Specific Pages

In certain cases, hiding some webpages is a good decision from a SEO and user experience perspective. For instance, there’s no point in the thank you webpage appearing on search results.

Optimise Crawl Budget

Every search engine allocates a certain number of crawling resources to crawl your website. If you have too many webpages for crawling, it may be that the important ones are not crawled at all. To avoid this scenario, use the crawl budget optimally by only indexing relevant pages.

Protect Against Server Overload

When only relevant webpages are viewable, the server will rarely experience an overload. However, when irrelevant pages are indexed and accessed by users, this can take up valuable server resources. It can slow down your website or prevent it from opening altogether.

Improve User Experience

By hiding irrelevant pages using the Magento 2 robots.txt file, you can determine what webpages users can see and access. This can be used to help them find the relevant page quickly, thereby enhancing their overall experience.

Magento 2 Robots.Txt Location

This is the most common question when talking about the Magento 2 robots.txt file. An easy way to locate your website robot’s txt file is to access the file via the following path:

https://domain.com/robots.txt

magento-2-robots-txt-file-sample

If you haven’t configured, you’ll either get an empty page or you’ll land on a 404-error page. Now that you know about the Magento 2 robots.txt file location, let’s move on to creating & configuring the robots.txt file.

noindex-nofollow-tag

NoIndex NoFollow Tags

The Magento 2 Noindex Nofollow extension allows you to control how search engines display your store and manage link juice distribution.

How To Create Robots.Txt in Magento 2?

To create Magento 2 robots.txt file, follow these steps:

  • Log in to Admin Panel.
  • Navigate to Content -> Design -> Configuration
magento-2-create-robots-txt

Expand the Search Engine Robots.

configure-magento-2-robots-txt

Set Default Robots to one of the following.

  • INDEX, FOLLOW: Tells crawlers to index the store and check later for changes.
  • NOINDEX, FOLLOW: Tells crawlers not to index the store but check later for changes.
  • INDEX, NOFOLLOW: Tells crawlers to index the store but don’t check later for any changes.
  • NOINDEX, NOFOLLOW: Tells crawlers not to index the store and don’t check later for any changes.

In Edit Custom instruction of robots.txt File, enter any custom instructions for the file if you feel the need for it. For example, you might want to disallow access to all folders while you are still developing your store. Refer below for some custom robots.txt instructions.

The Reset To Defaults button will reset the robots.txt file to the default, removing all the custom instructions.

Once you’re done, click Save Configuration button to apply the changes.

You can also use No-Index No-Follow Magento 2 extension to automatically set Meta Robots Tags for product, category & CMS pages.

Configuring Magento 2 Robots.txt with Custom Instructions

You can enter custom instructions for robots.txt file to allow/disallow crawlers to index certain pages or directories.

Magento 2 Robots.txt Example:

Below are commonly used examples.

Allow full access to all directories and pages

User-agent:*
Disallow:

To exclude all robots from the entire server (all directories & pages)

User-agent:*
Disallow: /

Disallow Bing bot from accessing a specific folder

User-agent: Bingbot 

Disallow: /foldername/

Disallow Bing bot from accessing a webpage

User-agent: Bingbot 

Disallow: /foldername/restricted-webpage.html

Default Instructions

1.	Disallow: /lib/
2.	Disallow: /*.php$
3.	Disallow: /pkginfo/
4.	Disallow: /report/
5.	Disallow: /var/
6.	Disallow: /catalog/
7.	Disallow: /customer/
8.	Disallow: /sendfriend/
9.	Disallow: /review/
10.	Disallow: /*SID=

Restrict User Accounts & Checkout Pages

1.	Disallow: /checkout/
2.	Disallow: /onestepcheckout/
3.	Disallow: /customer/
4.	Disallow: /customer/account/
5.	Disallow: /customer/account/login/

To Disallow Duplicate Content

1.	Disallow: /tag/
2.	Disallow: /review/

To Disallow CMS Directories

1.	Disallow: /app/
2.	Disallow: /bin/
3.	Disallow: /dev/
4.	Disallow: /lib/
5.	Disallow: /phpserver/
6.	Disallow: /pub/

To Disallow Catalog & Search Pages

1.	Disallow: /catalogsearch/
2.	Disallow: /catalog/product_compare/
3.	Disallow: /catalog/category/view/
4.	Disallow: /catalog/product/view/

To Disallow URL Filter Searches

1.	Disallow: /*?dir*
2.	Disallow: /*?dir=desc
3.	Disallow: /*?dir=asc
4.	Disallow: /*?limit=all
5.	Disallow: /*?mode*

Robots.Txt Tester & Validator

Now, that you have created the Robots.txt file, you can check & validate the file via Google’s Robots.txt Testing tool. With this tool, you can identify & validate robots.txt file errors.

Most Popular Web Crawlers and User-Agents

Below is the list of the most common Search Engine bots:

User-agent: Googlebot
User-agent: Googlebot-Image/1.0
User-agent: Googlebot-Video/1.0
User-agent: Bingbot
User-agent: DuckDuckBot
User-agent: YandexBot
User-agent: Baiduspider
User-agent: ia_archiver   #Alexa
User-agent: Slurp   #Yahoo

Final Thoughts on the Magento 2 Robots.txt File

As stated above, robots.txt file instructs web crawlers how to index your website. It is the way to communicate with search engines. To make sure your Magento 2 store pages are indexed and displayed the way you want, it is important to generate your Magento 2 robots.txt file correctly.

You don’t have to manually create the file as Magento creates it automatically. You can just add custom instructions to the file, and it will be configured accordingly. If you have any issue in configuring robots.txt file for your Magento 2 store, then contact our support team for an instant solution.

Related Articles: