Google Search document leak reveals inner workings of ranking algorithm

Welcome back

documents reveal how Google Search is using, or has used, clicks, links, content, entities, Chrome data and more to rank content.

Danny Goodwin on May 28, 2024

Google Search document leak reveals inner workings of ranking algorithm

A trove of leaked Google documents has given us an unprecedented look inside Google Search and revealed some of the most important elements Google uses to rank content.

What happened. Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-bot. These documents were shared with Rand Fishkin, SparkToro co-founder, earlier this month.

Why we care. We have been given a glimpse into how Google’s ranking algorithm works, which is invaluable for SEOs who can understand what it all means. In 2023, we got an unprecedented look at Yandex Search ranking factors via a leak, which was one of the biggest stories of that year.

This Google document leak? It will likely be one of the biggest stories in the history of SEO and Google Search.

What’s inside. Here’s what we know about the internal documents, thanks to Fishkin and Michael King, iPullRank CEO:

Current: The documentation indicates this information is accurate as of March.
Ranking features: 2,596 modules are represented in the API documentation with 14,014 attributes.
Weighting: The documents did not specify how ranking features are weighted – just that they exist.
Twiddlers: These are re-ranking functions that “can adjust the information retrieval score of a document or change the ranking of a document,” according to King.
Demotions: Content can be demoted for a variety of reasons, such as:
- A link doesn’t match the target site.
- SERP signals indicate user dissatisfaction.
- Product reviews.
- Location.
- Exact match domains.
- Porn
Change history: Google keeps a copy of every version of every page it has ever indexed. Meaning, Google can “remember” every change ever made to a page. However, Google only uses the last 20 changes of a URL when analyzing links.

Links matter. Shocking, I know. Link diversity and relevance remain key, the documents show. And PageRank is still very much alive in Google’s ranking features. PageRank for a website’s homepage is considered for every document.

This doesn’t prove Google spokespeople have lied about links not being a “top 3 ranking factor” or links mattering less for ranking. Two things can be true at once. Again, we don’t know how any of these features are weighted.

Successful clicks matter. This should not be a shocker, but if you want to rank well, you must keep creating great content and user experiences, based on the documents. Google uses a variety of measurements, including badClicks, goodClicks, lastLongestClicks and unsquashedClicks.

“[Y]ou need to drive more successful clicks using a broader set of queries and earn more link diversity if you want to continue to rank,” King said. “Conceptually, it makes sense because a very strong piece of content will do that. A focus on driving more qualified traffic to a better user experience will send signals to Google that your page deserves to rank.”

Documents and testimony from the U.S. vs. Google antitrust trial confirmed that Google uses clicks in ranking – especially with its Navboost system, “one of the important signals” Google uses for ranking.

Brand matters. Fishkin’s big takeaway? Brand matters more than anything else: “If there was one universal piece of advice I had for marketers seeking to broadly improve their organic search rankings and traffic, it would be: ‘Build a notable, popular, well-recognized brand in your space, outside of Google search.’”

Entities matter. Google stores author information associated with content and tries to determine whether an entity is the author of the document.

SiteAuthority: Google uses something called “siteAuthority”.

Google told us something like this existed in 2011, after the Panda update launched, stating publicly that “low quality content on part of a site can impact a site’s ranking as a whole.”
However, Google has denied having a website authority score in the years since then.

Chrome data. A module called ChromeInTotal indicates that Google uses data from its Chrome browser for search ranking.

Whitelists. A couple of modules indicate Google whitelist certain domains related to elections and COVID – isElectionAuthority and isCovidLocalAuthority. Though we’ve long known Google (and Bing) have “exception lists” when “specific algorithms inadvertently impact websites.”

Quick clarification. There is some dispute as to whether these documents were “leaked” or “discovered.” I’ve been told it’s likely the internal documents were accidentally included in a code review and pushed live from Google internal code base, where they were then discovered.

The source. Erfan Azimi, CEO and director of SEO for digital marketing agency EA Eagle Digital, posted this video, claiming responsibility for sharing the documents with Fishkin. Azimi is not employed by Google.

Erfan Azimi: Leaked Google Ranking Factors (Public Statement) - Rand Fishkin, Mike King

The post Google Search document leak reveals inner workings of ranking algorithm appeared first on MarTech.

MarTech

About the author

Staff

Danny Goodwin

Danny Goodwin is Senior Editor of Search Engine Land. In addition to writing daily about SEO, PPC, and more for Search Engine Land, Goodwin also manages Search Engine Land’s roster of subject-matter experts. He also helps program our conference series, SMX – Search Marketing Expo. Prior to joining Search Engine Land, Goodwin was Executive Editor at Search Engine Journal, where he led editorial initiatives for the brand. He also was an editor at Search Engine Watch. He has spoken at many major search conferences and virtual events, and has been sourced for his expertise by a wide range of publications and podcasts.

(14)

Report Post

Google Search document leak reveals inner workings of ranking algorithm

documents reveal how Google Search is using, or has used, clicks, links, content, entities, Chrome data and more to rank content.

You may also Like

Notablist — A Search Engine Built For Email Marketers — Includes 4M+ Searchable Email Campaigns

The Ultimate ReactJS Developer Job Description Template

9 Startup Approaches to Achieve Agility in Your Business

The Key to Continuing Productivity at Home

Questions Meta needs to answer about the metaverse at Connect

The state of data in 2024: How the ad industry is adapting to privacy regulations

Outsourcing eCommerce Business is the Best Thing for Your Company

Conquering Cart Abandonment

Google Analytics New “Trash Can” Feature Saves Deleted Data