9 Robots.txt Mistakes You Should Avoid

6min.

Comments:0

01 January 1970
Modifying and adapting the robots.txt file to fit the needs of your domain is an essential aspect of proper SEO optimization and website management. Robots.txt mistakes can happen all too often, so it’s important to avoid them!

6min.

Comments:0

Table of content: If you are getting a robots.txt error, you need to learn to fix them. This article is the best way to do that! Optimizing SEO profiles takes time, so it’s crucial to focus on the things that matter most. When you fix every error in robots.txt, you fix the inner workings of your site. This is essential to move forward!

What Are Common Robots.txt Mistakes?

Looking for every error in robots.txt will take time, but there are some common areas to focus on. By seeing what common errors exist, you can avoid them much more easily. The disasters that can happen with a corrupted robots.txt file are significant and can drastically affect your website. Robots.txt error correction can be a laborious process, but with our help, you’ll be able to identify and rectify anything that might arise! Fixing issues and allowing robots.txt to work in your favor, and not break your domain, is critical in order to succeed. Below are the common issues that you may face moving forward.

1. Not Placing the Robots.txt File in the Root Directory

To start the list, understanding the correct location of the robots.txt file is essential. If you place the file anywhere else, it will corrupt your site and create many issues. The robots.txt file should always be on the root directory of your site. This means that it should immediately follow the website URL. If you neglect this step and place the file elsewhere, the web crawlers will be unable to locate it and therefore unable to perform its function. An example of proper placement:
placeholder.com/files/robots.txt – INCORRECT placeholder.com/robots.txt – CORRECT

2. Wrong Use of Wildcards

Wildcards are the characters specifically used by directives defined for web crawling robots that are used within the robots.txt file. Specifically, there are two wildcards that need to be called to attention – the * and the $ symbols. The * character is shorthand for “every instance of,” or “0 or more individual valid characters.” The $ character is used to illustrate the end of a website URL. Using these two characters properly in your robots.txt file is essential. Examples of correct implementation include: To represent every type of user agent:
User-Agent: *
To disallow any URL with “/assets” present in its address:
Disallow: /assets*
To disallow any URL ending with a .pdf extension:
Disallow: *.pdf$
Using wildcards should be reserved for specific instances and not necessarily used all the time. Be careful using them, since they could have wide-reaching consequences you were not aware of at the time!

3. Putting ‘NoIndex’ in Robots.txt

An outdated strategy that no longer needs to be considered, putting the “NoIndex” directive in your robots.txt file will no longer work. In fact, Google discontinued the practice in 2019. At best, this means you may potentially have a lot of useless code in your robots.txt file, but at worse it may create chaos. Proper practice nowadays is to use the meta robots tag instead for this type of use case. The following code can be placed into the page code of the URLs you want to block Google from indexing.
<meta name =”robots” content=”noindex”/>
This prevents mistakes and errors in the robots.txt file and makes things much cleaner and localized.

4. Blocking Scripts and Style Sheets

The web is run on scripts and style sheets, so blocking them is a bad idea. In order for Google’s crawlers to rate your site’s page efficiency, they need to be able to access and run these scripts. It is imperative to not block any scripts or style sheets in your robots.txt file for this reason. Blocking these scripts will cause them to not be rendered by the crawlers and will drastically drop, if not negate entirely, your domain’s rank.

5. Not Including the Sitemap URL

The sitemap location for your domain will allow the crawler to discover your sitemap easily, which directly translates to a better ranking. Making it easier on the algorithms that dictate the rank of your domain will always be a bonus for optimization purposes. For this reason, putting the location in the robots.txt file is a very useful thing to do. Here is an example of how to place your sitemap’s URL:
Sitemap: https://www.placeholder.com/sitemap.xml

6. Unnecessary Use of Trailing Slash

Trailing slashes (slashes that trail after the end of a word: /example/), can give incorrect information to the bots scanning your site. Giving Google the proper information in the correct way is essential for proper crawling interaction and ranking. If you were looking to block a specific URL in your robots.txt file, it needs to be formatted correctly. For example, if you wanted to block placeholder.com/category but wrote the following command:
User-Agent: *Disallow: /category/
it would indicate to the Google crawler that you do not want it to crawl any URLs inside the folder “/category.” It will not block the desired URL. Instead, the command must be formatted like this:
User-Agent: *Disallow: /category

7. Ignoring Case Sensitivity

A simple yet important fact that can be easily overlooked is that URLs are case-sensitive for SEO crawlers. placeholder.com/Test and placeholder.com/test are two different websites as far as the crawler is concerned! This means your robots.txt file needs to reflect this reality. If you are using your robots.txt file to define various directives concerning URLs, case-sensitivity matters. For example, if you wanted to block placeholder.com/test: This would be INCORRECT:
User-Agent: *Disallow: /Test
and this would be CORRECT:
User-Agent: *Disallow: /test

8. Using One Robots.txt File for Different Subdomains

In order to get the most precise data to Google, you should have a unique robots.txt file for every single sub-domain of your website, including staging sites. If you do not, the Google crawler may decide to index a particular domain you do not wish to (such as a new and still-under-construction location). Creating efficiencies is important for Google to properly index your content the way you want it to be. Taking the time to categorize all of your domains carefully will pay off in the long run!

9. Not Blocking Access to Sites Under Construction

Staging sites, or sites that are under construction, are a crucial aspect of web development. As such, you want to make sure you have control over the creation process as much as possible. All fully functional websites were previously staged and then deployed, but were not snapshotted by Google. Getting a page that’s under construction indexed can be very detrimental to the overall growth of your domain – having your traffic go to an unfinished page instead of a finished one won’t help you! Blocking crawlers from crawling your creation pages is important to ensure they aren’t ranked. Add the following commands to the construction page’s robots.txt file to do so:
User-Agent: * Disallow: /

How Can I Recover from a Robots.txt Mistake?

While there are far-reaching consequences for every error in robots.txt, the good news is that they can be easily rectified! By fixing any mistakes and re-crawling the website, you will rank faster and more efficiently than you would otherwise. Fixing errors in some cases allows you to be ranked in the first place! A great way to see if your robots.txt file is broken or not is to use a site checker or tools like this. They will allow you to test your domain for any errors relating to your robots.txt file and allow you to fix and validate them. When you are trying to fix the robots.txt files of many different sub-domains, tools like this can come in incredibly handy!

An Optimized Site Relies on Proper Robots.txt Files

Checking for and repairing any robots.txt errors is one of the most important aspects of web creation in the current design era. Using the robots.txt file properly is a surefire way to allow Google to check your site and rank it accordingly, so you want to be as organized and functional as possible at all times. Common errors with robots.txt are easily fixed, but often require time to search for them. Coming to terms with the fact that SEO practices and optimization techniques often require considerable time investments is essential and fixing robots.txt mistakes is no different! At the end of it all, your website will perform better and be ranked higher than it would otherwise. If you need help with finding and fixing robots.txt mistakes, don't hesitate to contact us! Our specialists can do it for you in no time.
Author
Przemek Jaskierski SEO Delante
Author
Przemek Jaskierski

Senior SEO Specialist

He translates his experience in e-commerce into SEO. In 2014, he began his adventure in internet marketing, which continues to this day. He spends his free time at the gym, playing board games and watching TV series.

Author
Przemek Jaskierski SEO Delante
Author
Przemek Jaskierski

Senior SEO Specialist

He translates his experience in e-commerce into SEO. In 2014, he began his adventure in internet marketing, which continues to this day. He spends his free time at the gym, playing board games and watching TV series.

Leave a comment

FAQ

Get a
Free Quote

Awards

Award - Deloitte 2021 Award - European eCommerce Awards 2022 Award - European Search Awards 2022 Award - Global Agency Awards 2022 Award - IPMA Award - US Search Awards 2021

SEO SEM Agency based in Europe