Google’s Search advocate, John Mueller, removes confusion about updating robots.txt file multiple times daily, saying it doesn’t make any difference.
Robots.txt file recap
Robot.txt are commands you can add to your site or server to instruct Google to take specific actions, such as blocking access to particular URLs, not showing media files in search results, and accessing sensitive files.
Google’s explanation:
- “A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google.”
Blocking Googlebot to stop server overloading
Muller was asked a question on Bluesky about stopping Googlebot from crawling a website at different times a day to avoid overloading it; John said it was a bad idea because robots.txt can be cached every 24-hours and it won’t necessarily know you don’t want it to crawl a page at 10:00 am and do at 4:00 pm.
The question:
- “One of our technicians asked if they could upload a robots.txt file in the morning to block Googlebot and another one in the afternoon to allow it to crawl, as the website is extensive, and they thought it might overload the server. Do you think this would be a good practice?”
“(Obviously, the crawl rate of Googlebot adapts to how well the server responds, but I found it an interesting question to ask you) Thanks!”
John’s response to the question:
- “It’s a bad idea because robots.txt can be cached up to 24 hours (developers.google.com/search/docs/… ). We don’t recommend dynamically changing your robots.txt file like this over the course of a day. Use 503/429 when crawling is too much instead.”
But this isn’t new advice, as Mueller said something similar in Oct 2015.
Old advice against making dynamically generated robots.txt remains relevant
Barry Shwartz from Search Engine Land wrote a post on John Mueller advising why it’s not a good idea to make a dynamically driven robots.txt file, saying updating a static file by hand is a better approach.
John wrote in Oct 2015:
- “Making the robots.txt file dynamic (for the same host! Doing this for separate hosts is essentially just a normal robots.txt file for each of them.) would likely cause problems: it’s not crawled every time a URL is crawled from the site, so it can happen that the “wrong” version is cached. For example, if you make your robots.txt file block crawling during business hours, it’s possible that it’s cached then and followed for a day — meaning nothing gets crawled (or, alternately, cached when crawling is allowed). Google crawls the robots.txt file about once a day for most sites, for example.”
The takeaway
The lesson is that dynamically changing your robots.txt throughout the day confuses Google’s crawlers, which might allow Google to crawl media files, URLs, and sensitive documentation you don’t want or prevent it from crawling what you need. Google’s developer advocate, Martin Splitt, recently provided excellent advice on the best ways to block Googlebot from crawling your site and one common mistakes every site owner should avoid.