Class-Action Suit Accuses OpenAI of Stealing Data to Train AI Models

It's undeniable that AI is becoming a huge part of the tech world. Several companies have already adopted artificial intelligence into their systems, and some would argue that the innovation was boosted by the release of ChatGPT. However, the chatbot's creator, OpenAI, has been accused of inappropriate data use.

Class-Action Lawsuit Against OpenAI

The lawsuit was filed by persons who choose to remain anonymous to avoid backlash from the move. It states that OpenAI stole private and personal information without consent along with 300 billion words from the internet, effectively violating privacy laws.

Filed by Clarkson Law Firm in a San Francisco federal court, the lawsuit is said to be 157 pages long, which claims that the artificial intelligence company tapped content from books, articles, websites, and posts, which include personal information from individuals.

The people behind the lawsuit mentioned that "despite established protocols for the purchase and use of personal information, defendants took a different approach," which they indicated as theft, as mentioned in Interesting Engineering.

The lawsuit is quite extensive as it accuses OpenAI of larceny, unjust enrichment, and violations of the Electronic Communications Privacy Act and the Computer Fraud and Abuse Act as well. This resulted in the company being demanded to pay $3 billion in potential damages.

It was claimed that the company does not only steal data through scraping the internet for training data, but it was also accessing private information from the interaction that users have with products that have ChatGPT integration.

The accusation also contradicts OpenAI's mission to create a safe AGI that benefits all of humanity, considering that if the allegations are correct, the company is actually stealing from humanity, all for the sake of profiting from a trained AI model.

This Isn't the First Case

AI tools don't just suddenly have knowledge about the inquiries their users make. It has to learn from already existing data. Companies claim that the data they use are free to use and that they stray from licensed content. However, there have been signs that's not the case.

That's where the role of the law gets complicated. There was an instance where artists formed a class-action suit against generative AI platforms, accusing them of using their original work without a license to train their AI models to copy their art styles.

Image source giants like Getty Images filed a lawsuit against Stable Diffusion claiming that the generative AI tool improperly used its photos. As mentioned in Harvard Business Review, it violates copyright and trademark rights.

This also goes for texts as well. When asking an AI tool to generate an essay, there is a possibility that it would pull information from copyrighted material, resulting in a derivative outcome that could be grounds for copyright infringement.

Chatbots also tend to fabricate content with a mix of real information. For instance, a lawyer used ChatGPT to generate citings for a legal briefing. The citation turned out to be made up, but the chatbot ended up using the name of real judges, resulting in the lawyer being fined.