Google Provides New Insights Into Googlebot Crawling Process

Google launched a new series on Search Central called “Crawling December” that provides valuable insights for ensuring your website resources suit its Googlebot’s crawling process. 


Google releases Crawling December 

Google Search Central’s new “Crawling December” series will publish an article each week throughout the month explaining how Googlebot crawls and indexes your website’s pages, providing, as Google says, insights into “details that aren’t often talked about” that could have a significant impact on how it crawls your site. 

Crawling December’s first article discusses crawling basics and explains how Googlebot handles your site’s crawling budget, page resources, and other lesser-known essential details. 

Some questions Google answers in its first post include:

  • “What resources that make up a page?” 
  • “How does crawling these resources affect the crawl budget?” 
  • “Are these resources cacheable on Google’s side?” 
  • “And is there a difference between URLs that have not been crawled before and those that are already indexed?”

Google explains how Googlebot works 

Google begins its series by explaining how modern websites aren’t only HTML but now use JavaScript and CSS to provide essential functionalities and a vibrant user experience, making it harder for Googlebot to crawl.  

Google says Googlebot works like a browser but differently.

For example, like a browser, Googlebot visits your webpage and downloads the HTML from the “parent” URL, which may contain references to CSS, JavaScript, videos, and images. Google’s Web Rendering Service then downloads the site’s original data resources using Googlebot to create your webpage. 

Search Central provides the four steps it uses: 

  • “Googlebot downloads the initial data from the parent URL — the HTML of the page.”
  • “Googlebot passes on the fetched data to the Web Rendering Service (WRS).”
  • “Using Googlebot, WRS downloads the resources referenced in the original data.”
  • “WRS constructs the page using all the downloaded resources as a user’s browser would.”

Where Googlebot differs from a browser is the time taken to complete each step because of “scheduling constraints such as the perceived load of the server hosting the resources needed for rendering a page.”

Google says that is where crawl budgets come into effect.

Tips for managing your site’s crawl budget

Crawling the extra resources required for rendering a page can use up your website’s crawl budget. To avoid this, Google’s Web Rendering Service strives to cache the JavaScript and CSS referenced in the pages it renders. 

Google says that the Web Rendering Service cache remains for 30 days and isn’t affected by any HTTP caching instructions, which helps retain your site’s crawl budget. 

Here’s a screenshot of Google’s tips on optimizing your site’s crawl budget: 


Google also advises not to use robots.txt to block crawling as it can cause problems and to use a CDN or subdomain to preserve your site’s crawl budget. 

How to identify and monitor Googlebot’s resources 

Google says the best way to identify the resources Googlebot is crawling is to review your site’s raw access logs using Google’s IP ranges in our developer documentation or its Search Console Crawl Stats report

The takeaway

Google’s first “Crawling December” post explains how it searches for, identifies, and processes your website’s content, which is invaluable for SEO professionals and site owners in enabling them to maximize their site’s crawl budget and ensuring Google indexes every crucial page. Check out Google’s Search Central Crawling December page for this and next week’s posts.

Picture of Terry O'Toole

Terry O'Toole

Terry is a seasoned content marketing specialist with over six years of experience writing content that helps small businesses navigate where small businesses meet marketing - SEO, Social Media Marketing, etc. Terry has a proven track record of creating top-performing content in search results. When he is not writing content, Terry can be found on his boat in Italy or chilling in his villa in Spain.

Read by 10,000+ world-class SEOs, CEOs, Founders, & Marketers. Strategy breakdown: