If you're in travel marketing, you've likely heard about the recent Google API documentation leak from May 27, 2024, so I'll keep the introduction short. An SEO practitioner named Erfan Azimi revealed over 2,500 pages of Google’s internal API documentation, shedding light on how their search algorithm functions.
And real quick: This entire situation infuriates me - and in short: it’s because Google’s clearly been lying to us - which isn’t a surprise. I’ve known this to be true for quite some time. The silver lining is that the “white hats” who listen to everything that Google has to say are finally hopping on my boat: as SEOs, we are not a team with Google. We are in constant competition. No, “writing high-quality content” is not enough.
OK, stepping down from the soap box. Here's what we know:
How the Leak Occurred:
- Discovery: The documents were initially uploaded to GitHub on March 27, 2024, and remained accessible until May 7, 2024. Importantly, the code in that repository was published under the Apache 2.0 license, allowing anyone who found it to use, modify, and distribute it freely (probably a mistake, but we won’t make any assumptions here and will take the license usage at face value). For that reason, I'm linking to a cached version here: Google API Content Warehouse.
- Content: The leak includes more than 2,500 pages of API documentation with 14,014 attributes detailing data Google collects.
Google’s (very Google-y) Response:
- Official Statement: Google warned against making inaccurate assumptions based on “out-of-context” information.
- My Take: Let’s be real – Google has a history of BS-ing search engine marketers. My advice? Take their cautions with a grain of salt and let’s dig into what we’ve learned. That's enough gaslighting from Google for today.
Key Points from Google's Caution:
- Consistency in Misinformation: Over the years, Google has repeatedly denied certain ranking factors that the leak has now confirmed. Thanks… Google.
- Trust Issues (of which I have many): Given this history, we need to critically analyze Google's public statements and cross-check with practical SEO experiences - something that most SEO agencies and travel marketers don't do (#Lazyyyy).
Key Findings from the Google API Docs Leak
I also considered the header of this section to be “all of the things Google’s lied to us about,” but I thought that’d be a bit too on the nose.
Caution: Before jumping into the findings, it's crucial to note that while the leaked documents reveal what factors Google tracks, they do not specify the weight these factors have in the final search algorithm.
My insights are based on observed patterns and professional experience, but the exact influence of each factor remains speculative. That’s why we use statistical analysis software to assess each keyword we target. It helps us surface the factors at play for each specific term we target.
Based on my research, here are the takeaways from the leaked documents that every travel marketer should know - I'll attribute sources at the bottom of this article:
Clicks & CTR (NavBoost)
Let’s talk about NavBoost: something Google denied existed - only until they had to testify to Congress under the penalty of perjury (thanks US Department of Justice). The truth is, that Google has extensively denied that clickthrough rate, time on page, and other click-related factors drive their algorithm. Funny, then, to find incredibly consistent proof that click data is heavily measured by Google’s NavBoost algorithm. Google uses clickstream data and there’s no doubt about it.
- Clicks & CTR Matter: Despite Google’s denials, clicks and click-through rates (CTR) do seem to have an impact on rankings. I’ve always told people that clicks and CTR matter, and this leak supports that.
- Clickstream Data: This is the trail of clicks users leave behind as they navigate the web, particularly through Chrome browsers. Google gets most of its clickstream data from Chrome, as 65.31% of the world uses Chrome as their web browser, according to Oberlo.
- Definition: Clickstream data tracks the sequence of clicks made by a user while browsing the internet, revealing browsing behavior and preferences.
- Usage: Google leverages this data to understand user interactions and adjust search rankings accordingly.
Domain Age
Here's another Google “lie.”
The API documentation supports that “Domain Age” is a metric that is assessed. Potential Influence: Older domains may have an advantage in rankings, although the specific weight is unknown. Domain age has always been a point of contention, but the leak confirms it is a factor that Google considers.
Website Whitelists (and Trusted Travel Websites)
Google has been whitelisting certain sites, potentially giving them a ranking advantage. This could include major travel sites or government websites.
I specifically believe this could relate to Department of State travel regulations and classifications, as it falls in the same category as political and health-trusted sites. It’s unlikely that sites like Tripadvisor are the targets of these trusted site designations.
Subdomains as Separate Entities
Google has long held the position that subdomains are considered part of the core website.
Based on the documentation, however: subdomains are treated as separate entities, meaning they don’t inherit the main site’s authority.
If your travel blog is hosted on a subdomain of your main site, it will need to build its authority separately. This is why we always recommend either using the same CMS for your main site and blog, or a reverse proxy if they’re running on two different servers.
Too Much Fun in the Sandbox?
There’s anecdotal evidence that pages on new websites are much harder to rank than pages on established websites. The industry has long referred to this as “Google Sandbox.” Google has explicitly denied the existence of the “Sandbox” through their search liaison team. Guess what? Another lie.
New websites often start in a sandbox where their rankings are artificially held back until they’ve proven their worth. It makes sense: This allows Google to monitor new sites for quality and trustworthiness before they can compete in the rankings.
Specific Travel Industry Findings from the API Leak
The leaked Google documentation provides (very) limited insights specific to the travel industry. Here are the slim pickings of what I’ve found relevant for travel marketers:
Criteria for Qualifying Good Travel Sites
Google uses several criteria to determine the quality of travel sites. These include:
- Language of the Travel Site: The content must be in the appropriate language for the target audience.
- Aggregation: Whether the site aggregates travel information from various sources.
- Official Attraction/Entity Status: Sites representing official attractions or recognized entities are favored.
- Official Hotel Sites: Sites that are the official websites for hotels.
You can find more details on how Google qualifies good travel sites in their QualityTravelGoodSitesData documentation.
Details About Airline Data
Google collects extensive data about airlines, which likely powers tools like Google Flights. This data includes:
- Airline Contact Info: Grouped by language locale.
- Carryon Baggage Limitations: Details on what passengers can bring onboard.
- Baggage Fee URLs: Links to where baggage fees are discussed.
- Country Code: The airline's country code.
- Fare Family: Information on fare classes and what they offer.
- Mileage Programs: Details on which mileage programs are applicable for specific bookings.
- Eco-friendliness: Information about environmentally friendly options (Green Fares).
- IATA Code and Airline Name: Identification codes and names.
- Passenger Assistance URLs: Links for passenger support.
- Flight Count: The number of flights an airline runs over the next 180 days. It’s unclear whether this is the number of routes or the number of actual flights.
- Sustainability Programs: URLs for sustainability initiatives.
- Home Page URL: The main website for the airline.
For more in-depth information, you can refer to the TravelFlightsAirlineConfig documentation.
Classification of Hotel Types
Google also categorizes hotel types and tracks various attributes related to hotels and rentals:
- Hotel Types: Includes motels, hotels, youth hostels, and guest houses.
- Star Ratings: Ratings that indicate the quality of the hotel.
- Occupancy Constraints: Information on room configurations, including the number of bathrooms and bedrooms.
Google’s classifications can be explored further in their NlpSemanticParsingLocalHotelType documentation.
What the Leak Doesn’t Tell Us About Travel
Were you hoping for more? Sorry - that’s all. I’ll show you how the general learnings can be applied to travel websites, next.
Note: There is no content related to tours, activities, or Google TTD (Things To Do) in the leaked documents.
Detailed Implications for Travel Industry SEO
Site-Wide Quality Factors
For a company that talks about its algorithm being page-driven, there sure are a lot of sitewide factors in this leak. Google’s leaked documents emphasize the importance of site-wide quality, which can significantly impact individual page rankings.
The overall quality of your website affects the rankings of individual pages. Low-quality content on your site can drag down high-quality pages.
Maintaining a consistently high quality across your entire site is essential. Remove or improve low-quality pages to boost your site’s overall performance.
We did that with our client, Thrifty Traveler. You can learn more about that by reading the Thrifty Traveler Case Study.
Content and Backlinks
Google: “Backlinks don’t matter.”
Google API Doc Leak: “Here’s a looooonnnng list of factors that measure links, their authority, and topical relevance.”
Me: “Thanks again, Google.”
We have always held that content quality and backlink strategies are fundamental to successful SEO. Here’s what the leak reveals about these elements:
Dwell Time and Click Classification
Google tracks how long users stay on a website before returning to the search results, designating some clicks as “good” and others as “bad.” This metric, which I call dwell time, is a significant indicator of content quality. Interestingly, Google has specifically denied the importance of dwell time, yet similar measures of “Good Clicks” exist in their API documentation.
Example: If visitors spend a lot of time on your travel site, it signals to Google that your content is valuable and relevant.
Content Update Frequency Impacts Index Storage Location
Google’s index, named “Alexandria,” stores URLs in three tiers and stores them in different places. It’s fair to assume the hyper-relevant and timely data is stored in flash memory and is very likely to come up in search for news-related queries and developing topics. Google stores frequently updated content in flash memory, solid state drives are used for less important/timely content, and hard drives are used for rarely updated & stale content.
Recommendation: Regularly update your content to ensure it’s stored in higher-priority tiers, which can positively impact visibility and ranking. I can’t believe I’m saying this, because I’ve been telling people “recently updated content isn’t that important” for a long while - but even I can be wrong.
Topical Authority and Focus
Google evaluates the topical relevance and authority of your content. They measure it with factors called “siteFocusScore,” “siteRadius,” “siteEmbeddings,” and “pageEmbeddings.” Maintaining a focused content strategy helps establish your site as an authority in specific areas. The more you do this, the harder it will be to rank for other competitive topics, and the easier it will be to rank for your authority topic.
Backlink Quality and Traffic
Backlinks are categorized based on traffic & clickstream data - surprise, surprise. Links from high-traffic pages are more valuable.
- Focus on acquiring backlinks from reputable sites with high traffic to improve your organic search visibility. Make sure the links you are acquiring are in the correct niche and that the website the links are pointing from has established traffic.
- Note: Newly acquired links are weighted more heavily than existing ones, so maintaining a steady link-building strategy is crucial.
Technical SEO Factors
Technical aspects of SEO are crucial for travel websites - especially large websites like OTAs - as they ensure search engines can effectively crawl and understand your content. Here’s what we learned from the leak about technical SEO:
Page Layout and Content Positioning: Important content should be placed at the top of the page, as Google may not always index the entire page. - Example: For travel websites, include key details like “About the destination” at the top of your pages.
TitleMatchScore: Matching your title with the primary search term can positively impact rankings. Ensure your page titles are relevant and include primary search terms.
Date Factors in SEO: Google considers multiple dates – bylineDate (explicitly stated in the article), syntacticDate (from the URL or Title), and semanticDate (derived from page content). Consistency across these dates is important. Ensure that the published date is accurately reflected in the title and content. Generally, at Propellic, we caution against using dates in your URL because you can’t change the article date without a redirect.
Practical Advice for Travel Industry SEO
Curious how this applies to travel industry SEO? Here’s what we recommend:
Site Design and Architecture
A well-designed site with intuitive architecture is crucial for optimizing your SEO and enhancing user experience - and remember, NavBoost is a user experience algorithm and is likely a strong weight in Google’s algorithm.. Here are some key strategies:
The NavBoost algorithm (the one we mentioned earlier which was initially exposed during the US Department of Justice’s probe into Google’s monopolistic qualities) rewards sites with easy navigation. Ensure your site’s architecture is logical and user-friendly, enabling visitors to find information quickly. We don’t want people bouncing off your website because they couldn’t find the right page - that sends a poor user experience signal.
You should implement a clear hierarchy in your navigation menu, use breadcrumbs, and ensure that important pages (ahem… booking pages) are easily accessible from the homepage - and every page, for that matter.
You should also be blocking or removing pages that aren’t topically relevant. Pages that don’t align with your site’s main topics can dilute your topical authority and negatively impact your rankings. We found proof that Google assesses this in the leak.
Regularly audit your site for irrelevant or low-quality pages and either remove them or block them from being indexed by search engines, after they've had every possible chance to perform.
Content Strategy
And finally, the ONLY section that’s fairly consistent with what Google’s been preaching over the past two decades: yes, creating and maintaining high-quality content is essential for ranking well and engaging your audience. Notably, content on a specific topic in your domain of expertise. (remember the “radius” topics we looked at above?) Here’s how to do it:
Regularly Updating Content to Ensure Freshness and Relevance:
From this leak, we can assume that fresh content is prioritized in Google’s indexing process, and regularly updated sites are seen as more active and reliable - and they sit in flash storage, which means they are quick and easy for Google to fetch when delivering search results.
Set a schedule for content updates, add new information, refresh old articles, and remove outdated content. Even better, add videos and interactive tools - increasing your "effort" rating. (In this leak, we learned that Google uses an LLM to rate the “effort” an article/document likely required to create - which makes sense, I guess?)
Optimizing Headings and Content Around Specific Queries:
Ensure your headings and content directly address common queries related to your niche. Note that in this article, I’ve broken down the structure in a way that answers questions I assume people are asking like “What does the Google API leak mean for travel websites?”
For less “breaking news” topics, where the tools are able to accumulate enough historical search data, you can use tools like Ahrefs, Answer the Public, and Semrush to find popular questions and incorporate them into your headings and content. You can also engineer prompts for ChatGPT or Gemini to get question ideas, though you won’t be able to layer in search volumes without extensions or cross-checking.
Writing Content That Can Earn Impressions and Clicks Consistently:
You may choose to read this as “write clickbaity titles” - which - admittedly - isn’t that far from the truth. Importantly: the content needs to retain the users it generates to maintain “good click” quality. Be sure to create compelling titles, use multimedia elements (images, videos), and write engaging introductions to capture readers’ interest.
Link Building and Maintenance
And finally, links!
Building and maintaining a robust backlink profile is a cornerstone of effective SEO. Here’s what you need to focus on:
Focusing on Acquiring High-Quality Backlinks from Relevant Sources:
- Quality Over Quantity: High-quality backlinks from reputable sources boost your authority more than numerous low-quality links. Duh.
- Actionable Tip: Strong content continues to acquire new and authoritative links’ - hitting a rank goal by closing link gaps is the first (major) step but if that page is not growing links organically, it should remain a target for occasional new links.
- Actionable Tip: Reach out to authoritative travel blogs, news sites, and industry publications for guest posting opportunities or collaborations. Or contact Propellic and we can help you acquire those links. Contact Propellic.
Monitoring and Adjusting Link Velocity to Avoid Penalties:
- Link Velocity: Google monitors how quickly you acquire backlinks. A sudden spike can be seen as unnatural.
- Actionable Tip: Aim for a steady acquisition of backlinks over time. I prefer Ahrefs for tracking link acquisition velocity.
Importance of Link Sources Being from the Same Country:
- Relevance: Links from sites in the same country as your target audience are more valuable, assuming Google is using that portion of the API documentation in document ratings & search rankings.
- Actionable Tip: Focus your link-building efforts on in-country websites, regional directories, and country-specific industry resources. DMOs are a great place to start.
So What Now?
And there you have it - Google’s Content Warehouse API documentation leak and what it means for the travel industry. Again, there’s not much travel-specific content in the leak, but hopefully, my guidance above helped you understand the leak in the context of travel. If there’s anything missing or you have any questions, don’t hesitate to reach out to me on LinkedIn or contact Propellic by emailing us or visiting our contact page.
Sources
- https://sparktoro.com/blog/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them/
- https://ipullrank.com/google-algo-leak
- https://hexdocs.pm/google_api_content_warehouse/0.4.0/search.html?q=travel
- https://searchengineland.com/unpacking-googles-massive-search-documentation-leak-442716
Want To Level Up Your Travel Marketing?
Subscribe to the NavLog, our bi-weekly travel marketing roundup, where you’ll be the first to know about breaking news that impacts travel marketers and access exclusive performance marketing strategies and practical tips you can implement from the marketers at the leading edge of the travel industry.