OpenAI is facing a lawsuit from The New York Times and Daily News for allegedly using their copyrighted material to train AI models without permission. The publishers' lawyers claim OpenAI engineers accidentally deleted crucial data from a virtual machine provided for searching OpenAI's training datasets.
While OpenAI managed to recover most of the data, the folder structure and file names were lost, rendering the recovered data unusable for the publishers' investigation. This incident has caused significant delays and forced the publishers to restart their analysis. See a related article on AI development challenges.
OpenAI denies deleting the data, attributing the issue to a misconfiguration requested by the plaintiffs. They claim no files were actually lost. This incident highlights the complexities of data management in AI development.
OpenAI argues that using publicly available data, including news articles, to train its models falls under fair use. However, they have recently signed licensing agreements with several news publishers, including the Associated Press and Axel Springer, for undisclosed amounts. Learn more about AI tools and their applications.
This case raises important questions about copyright and fair use in the age of AI, particularly regarding the use of copyrighted material for training large language models.