Demystifying Google's Common Crawlers: A Comprehensive Guide

Published On Mon Sep 16 2024
Demystifying Google's Common Crawlers: A Comprehensive Guide

Introduction to Google's Common Crawlers

Google's common crawlers are essential tools used to discover information for constructing Google's search indexes, executing specific product crawls, and conducting analysis. These crawlers adhere to the robots.txt rules when they crawl automatically. The general technical properties of Google's crawlers also pertain to the common crawlers.

IP Ranges and Hostnames

The common crawlers typically operate from the IP ranges specified in the googlebot.json object. Their hostnames' reverse DNS mask should match either crawl-***-***-***-***.googlebot.com or geo-crawl-***-***-***-***.geo.googlebot.com.

User Agents and Crawl Preferences

The following list provides details on the common crawlers, including their user agent strings as seen in the HTTP requests, user agent tokens for the User-agent: line in robots.txt, and the products affected by crawl preferences for each crawler. It's important to note that some crawlers may have multiple user agent tokens, and matching any one token will enforce a rule. While the list is not exhaustive, it covers the requestors that are more likely to appear in log files and have generated inquiries.

css - Bootstrap 5.3 floating labels acting weird in my Google ...

Chrome User Agent Strings

In the user agent strings within the table, if you encounter the term Chrome/W.X.Y.Z, where W.X.Y.Z is a placeholder denoting the Chrome browser version used by the user agent (e.g., 41.0.2272.96), this version number will evolve over time to align with the latest Chromium release utilized by Googlebot, as explained in this post.

web crawler - Is it possible to use Googlebot's user agent token ...

Note on User Agents

If you are scanning your logs or filtering your server for a user agent with this specific format, it is advisable to employ wildcards for the version number instead of specifying an exact version.

Additional Information

Unless stated otherwise, the content on this page is licensed under the Creative Commons Attribution 4.0 License, while the code samples are licensed under the Apache 2.0 License. For further information, please refer to the Google Developers Site Policies.

Java is a registered trademark of Oracle and/or its affiliates. Last updated on 2024-09-16 UTC.