Google Will Use Reddit Content to Train Its AI Models

A lot of AI companies have faced a couple of lawsuits after being accused of using licensed data to train their AI models. It appears that Google will not be making the same mistake by licensing social networking site content to train its AI.

Google Turns to Reddit

Using the right kind of content to train an AI model largely contributes to how it eventually responds to prompts. There are a lot of sources full of data that can provide AI developers with what they need, and that includes Reddit. With that said, Google strikes a deal with the platform.

Of course, Google will abide by Reddit's protocols regarding its API. To gain access, the search engine giant will have to cover the costs, which are reportedly around $60 million per year, which allows Google to "better understand" content from Reddit, as per Engadget.

In the statement about its expanded partnership with Reddit, Google Vice President Rajan Patel says that "Google will now have efficient and structured access to fresher information, as well as enhanced signals," all of which will be used in the most "accurate and relevant ways."

In case you don't know, a lot of people opt for Reddit when they need information on certain matters, as well as advice from a vast number of users online. Google recognizes this culture, even pointing it out in the blog post.

"[W]e've seen that people increasingly use Google to search for helpful content on Reddit to find product recommendations, travel advice and much more. We know people find this information useful, so we're developing ways to make it even easier to access across Google products."

Reddit will also be benefitting from the partnership, other than the compensation for its data, of course. The social site will integrate new AI-powered capabilities using Vertex AI, which will enhance its search and other functions in the platform.

Acquiring Training Data the Right Way

Many AI companies don't disclose where they get their training data from, which just fuels the doubt that others already have, accusing them of using licensed content without permission from its creators to train AI models.

Both Microsoft and OpenAI have been at the receiving end of lawsuits for this issue. The New York Times, specifically, sued both companies for using millions of their copyrighted work to train chatbots, particularly ChatGPT, back in December 2023.

In the filing, NYT claims that the tech companies should be held responsible for "billions of dollars in statutory and actual damages" for unlawfully copying valuable works from the publication.

OpenAI spokesperson Lindsey Held said that the company was "surprised and disappointed" by the turn of events and that they were "moving forward constructively" as they settled matters with the news organization.

Many other news outlets have resorted to measures that blocked OpenAI's web crawlers, even before the lawsuit. The companies include CNN, Reuters, and ABC News. These are just a few examples compared to many creators who also claimed that their works were used by the AI company without consent.