Importance of Robots.txt File in SEO
One of the important files in SEO is the ‘Robots.txt’ file. This file tells the web crawlers also known as web robots which pages of the domain or the files are not to be crawled. Crawlers visit your website and index the pages or files before listing them in the search results.
In the Robots.txt file, you can use the ‘Disallow’ command to tell the search engines which pages of your website are not to be searched. For example, if you use
then the search engine will be blocked from visiting the following page
The ‘User-agent’ can be used to specify the robots you want to block. The command
will block Google robots but the other robots will still have access to the page.
If you do not want certain pages or files not to be listed by Google or other search engines then you can use the ‘robots.txt’ file. You can easily check if your website has a robots.txt file by using the command
The format of the URL you enter should be ‘domainname.com/robots.txt’ or ‘subdomainname.com/robots.txt’.
Why do you need to block some pages?
You can include the commands in the robots.txt file which tell the search engine not to access the page or index it and not to send visitors to the page. There are various reasons why you may want to block a page using the robots.txt file:
- You have a page on your website which is a duplicate of another page and you do not want to index it because it would result in duplicate content.
- You have a page on your website that you do not want the users to access until they take a specific action.
- You want to protect your private files on your website like ‘CGI-bin’ and keep your bandwidth from being used up by the robots indexing the image files.
- You would not want to index the broken pages, internal search result pages, login pages and certain areas of your website like staging websites for developers, XML sitemap and more.
How to create a Robots.txt file?
You can create the robots.txt file as follows:
- Create a new text file using Notepad or TextEdit and name it ‘robots.txt’. Next, use ‘Save as’ and save the file with the ’txt’ extension.
- Upload this file to the root directory of your website. This is a root-level folder called ‘htdocs’ or ‘www’ and makes the file appear directly after the domain name.
- If you use sub-domains then create the robots.txt file for each sub-domain.
- You can check the file by entering yourdomain.com/robots.txt in the browser address bar.
- Set up a Google Webmasters Tools account and select the option ‘crawler access’ under the ‘site configuration option in the menu bar. Select ‘generate robots.txt’ to set up the file. You can specify which robots to block in the ‘User-agent’ and also the directories and files you want to block.
Installing the Robots.txt file
Once you have created the robots.txt file you have to upload it to the main directory in the CNC area of your website. For this, you can use an FTP program like Filezilla.
In case you add more pages to your website that you do not want to index by the search engines then you will have to update your robots.txt file. If you do not use a robots.txt file then the search engine will get a free run to index anything on the website. You can test your robots.txt file to check if it is working as expected in Google Search Console.
You can also consult us if you have any questions about Robots.txt file or need SEO services.