Microsoft, Other Search Engines Blocked From Indexing Reddit Content

Microsoft Blocked From Reddit Content, Declined To Play By Rules

by , Staff Writer @lauriesullivan, July 25, 2024

Microsoft, Other Search Engines Blocked From Indexing Reddit Content

Reddit recognizes that its user-generated content and the ideas behind it have become a valuable source of information for millions — similar to the awareness of what user-generated content could lead to in the early days of Twitter, now X.

Despite this awareness on the part of search engines, it appears that Google has been more than willing to spend millions for that content and abide by Reddit’s privacy rules to gain information that supports additional traffic.

Reddit reportedly has begun to block much of its content from showing up in search engines such as Microsoft Bing, DuckDuckGo, and Mojeek.

This decision seems to have been made partially in order to gain a new source of revenue, as well as a way to protect user content on the site. The company said it block all crawlers that are unwilling to commit to not using crawl data for AI training, which is in line with enforcing our Public Content Policy and updated robots.txt file.

Reddit has become more protective of its user-generated data. It made its API more expensive for third-party developers, and enforced its scraping policy last month by updating the site’s robots.txt file that tells web crawlers whether its can access a site. 

And while the company has been in discussions with multiple search engines, some of those talks ended without reaching an agreement, because the negotiators were not able or willing to make guarantees with regard to the use of Reddit content. This includes AI. 

There are sites that have access and have been serving Reddit results. For example, Internet Archive continues to have access. A company spokesperson said it has worked to provide access for legitimate research purposes, such as reddit4research.

A Reddit spokesperson said “we are open to working with partners, big and small, and are doing so today,” so it is not accurate to say it only allow one search engine to crawl Reddit.

Reddit has long supported the open internet and protecting user rights. Its Public Content Policy outlines how these values apply to public content in a new AI era.

“It’s bad for the health of the internet for for-profit companies to scrape our content without constraint and use it for, among other things, to train AI models,” the Reddit spokesperson wrote in an email.

Google has been working to index and surface more data from social platforms for years.

In February, the company announced a data deal with Reddit that allowed it to train its AI models as well as more prominently serve up results in Google Search.

The licensing deal, estimated at about $60 million, changed the search model by allowing Reddit posts to outrank other websites.

In June 2023, Google executives spoke about a resource for search users at Google I/O to personalize content from user-generated forums. The product — named Perspectives — would provide the ability to surface more relevant content and videos from discussion forums such as Quora, Reddit, TikTok, and YouTube.

Purchasing content is not new. Publishers worldwide have been pushing search engines and social-media platforms to pay up or pull out from surfacing news and content from feed on their platforms. Reddit has now joined in.

Bing, DuckDuckGo, Mojeek and other search engines are no longer indexing and serving up full Reddit results as the company rethinks its business model. MediaPost.com: Search & Performance Marketing Daily

(1)