Google has updated its robots.txt files Search Central Documentation, clarifying the four robots.txt fields it supports and the unsupported directives it ignores.
Robots.txt recap
Robots.txt are directives you can add to your website or apply to a server instructing Google to stop crawlers from accessing specific website pages.
The primary purpose of setting robots.txt is to control crawler access to specific areas of your website to prevent overloading it with requests that can cause Google to not index your content.
Robots.txt instructions can include:
- Your webpage: Robots.txt can stop Google from accessing and indexing specific URLs on your website.
- Your media: Stop video, image, and audio files from appearing in Google Search.
- Your resource files: Block resource files, like style sheets or irrelevant scripts.
The update
SEJ writer Matt G. Southern’s keen eye noticed the update in Google’s Search Central Documentation, which clarifies that it only supports four specific fields in robots.txt files and will ignore any not on the list.
Those robots.txt files are:
- user-agent: Identifies the crawler to which the rules apply.
- allow: A URL path Google can crawl.
- disallow: A URL path Google cannot crawl.
- sitemap: The full URL of a sitemap.
Google updated its documentation to remove any confusion about what robots.txt files it supports to provide clear guidance to developers and site owners.
Google said when releasing the update:
- “We sometimes get questions about fields that aren’t explicitly listed as supported, and we want to make it clear that they aren’t.”
What the update means
The latest robots.txt file update aims to remove any confusion for site owners and developers and ensure they do not mistakenly use unsupported directives, which could harm their site’s functionality.
Google’s update now instructs users to use only those fields in its documentation.
To ensure you do, consider auditing your existing robots.txt files to confirm that Google’s directives support them.
And review Google’s Search Central page to stay updated on Google’s guidelines and best practices and to learn how to implement robots.txt correctly.