As the title indicates, we are going to be discussing everything there is to know about the Magento 2 robots.txt file.
Robots.txt is an all-important file from the search engine optimisation (SEO) perspective which contains instructions for the web crawlers i.e. search engines as to which pages to crawl and which not. This one tiny file can get your entire store or specific pages out of search index if configured incorrectly.
There will be sections or pages in your store which you won’t want to be indexed and robots.txt file is the place to communicate this to search engines. Almost every website needs this file and so do Magento 2 stores. Of course, telling which webpages to crawl is not the only purpose.
There are several reasons why you need to know how to configure the robots.txt file in Magento 2. Let’s go through a few of them.
If you wish to avoid showing sensitive webpages such as administrative directories on the search engines, you need to tell the search engine to stop indexing them. Similarly, you may want to restrict the login or authentication pages to only relevant users rather than have them visible to everyone.
If you have duplicate content on your website for whatever reason, the last thing you want is the search engine crawling all those pages. If they do, they’ll penalise your website for duplicate. Your website will go down in search rankings or be deindexed altogether. So, the better option is to prevent the search engine from crawling duplicate content pages.
There is no point in having internal search result pages indexed. They serve no purpose at all from a SEO perspective. Thus, you must prevent them from being indexed using the robots.txt file.
In certain cases, hiding some webpages is a good decision from a SEO and user experience perspective. For instance, there’s no point in the thank you webpage appearing on search results.
Every search engine allocates a certain number of crawling resources to crawl your website. If you have too many webpages for crawling, it may be that the important ones are not crawled at all. To avoid this scenario, use the crawl budget optimally by only indexing relevant pages.
When only relevant webpages are viewable, the server will rarely experience an overload. However, when irrelevant pages are indexed and accessed by users, this can take up valuable server resources. It can slow down your website or prevent it from opening altogether.
By hiding irrelevant pages using the Magento 2 robots.txt file, you can determine what webpages users can see and access. This can be used to help them find the relevant page quickly, thereby enhancing their overall experience.
This is the most common question when talking about the Magento 2 robots.txt file. An easy way to locate your website robot’s txt file is to access the file via the following path:
https://domain.com/robots.txt
If you haven’t configured, you’ll either get an empty page or you’ll land on a 404-error page. Now that you know about the Magento 2 robots.txt file location, let’s move on to creating & configuring the robots.txt file.
To create Magento 2 robots.txt file, follow these steps:
Expand the Search Engine Robots.
Set Default Robots to one of the following.
In Edit Custom instruction of robots.txt File, enter any custom instructions for the file if you feel the need for it. For example, you might want to disallow access to all folders while you are still developing your store. Refer below for some custom robots.txt instructions.
The Reset To Defaults button will reset the robots.txt file to the default, removing all the custom instructions.
Once you’re done, click Save Configuration button to apply the changes.
You can also use No-Index No-Follow Magento 2 extension to automatically set Meta Robots Tags for product, category & CMS pages.You can enter custom instructions for robots.txt file to allow/disallow crawlers to index certain pages or directories.
Below are commonly used examples.
Allow full access to all directories and pages
User-agent:* Disallow:
To exclude all robots from the entire server (all directories & pages)
User-agent:* Disallow: /
Disallow Bing bot from accessing a specific folder
User-agent: Bingbot Disallow: /foldername/
Disallow Bing bot from accessing a webpage
User-agent: Bingbot Disallow: /foldername/restricted-webpage.html
Default Instructions
1. Disallow: /lib/ 2. Disallow: /*.php$ 3. Disallow: /pkginfo/ 4. Disallow: /report/ 5. Disallow: /var/ 6. Disallow: /catalog/ 7. Disallow: /customer/ 8. Disallow: /sendfriend/ 9. Disallow: /review/ 10. Disallow: /*SID=
Restrict User Accounts & Checkout Pages
1. Disallow: /checkout/ 2. Disallow: /onestepcheckout/ 3. Disallow: /customer/ 4. Disallow: /customer/account/ 5. Disallow: /customer/account/login/
To Disallow Duplicate Content
1. Disallow: /tag/ 2. Disallow: /review/
To Disallow CMS Directories
1. Disallow: /app/ 2. Disallow: /bin/ 3. Disallow: /dev/ 4. Disallow: /lib/ 5. Disallow: /phpserver/ 6. Disallow: /pub/
To Disallow Catalog & Search Pages
1. Disallow: /catalogsearch/ 2. Disallow: /catalog/product_compare/ 3. Disallow: /catalog/category/view/ 4. Disallow: /catalog/product/view/
To Disallow URL Filter Searches
1. Disallow: /*?dir* 2. Disallow: /*?dir=desc 3. Disallow: /*?dir=asc 4. Disallow: /*?limit=all 5. Disallow: /*?mode*
Now, that you have created the Robots.txt file, you can check & validate the file via Google’s Robots.txt Testing tool. With this tool, you can identify & validate robots.txt file errors.
Below is the list of the most common Search Engine bots:
User-agent: Googlebot User-agent: Googlebot-Image/1.0 User-agent: Googlebot-Video/1.0 User-agent: Bingbot User-agent: DuckDuckBot User-agent: YandexBot User-agent: Baiduspider User-agent: ia_archiver #Alexa User-agent: Slurp #Yahoo
As stated above, robots.txt file instructs web crawlers how to index your website. It is the way to communicate with search engines. To make sure your Magento 2 store pages are indexed and displayed the way you want, it is important to generate your Magento 2 robots.txt file correctly.
You don’t have to manually create the file as Magento creates it automatically. You can just add custom instructions to the file, and it will be configured accordingly. If you have any issue in configuring robots.txt file for your Magento 2 store, then contact our support team for an instant solution.
Related Articles:This blog was created with FME's SEO-friendly blog