OpenAI Accused of Deleting Key Data in Copyright Lawsuit

In their ongoing legal battle against OpenAI, attorneys for The New York Times and Daily News have charged the corporation with erasing possibly important case-related data. The publishers claim that OpenAI illegally used their content to train its AI algorithms. In order to enable the plaintiffs’ legal teams and hired experts to explore its datasets for their copyrighted works, OpenAI agreed earlier this fall to supply two virtual machines. The purpose of these virtual machines was to assist in determining whether publications from the publishers should be included in OpenAI’s training set. These searches have taken the teams more than 150 hours since November 1.

According to a filing submitted to the U.S. District Court for the Southern District of New York, OpenAI developers inadvertently deleted all search data stored on one of the virtual machines on November 14. Although OpenAI attempted to recover the data, the plaintiffs were unable to identify how their content may have been used in OpenAI’s models because the recovered files lacked folder structures and filenames, rendering them unusable. Plaintiffs’ counsel emphasized that they do not believe the deletion was intentional, but argued that the incident underscores the need for OpenAI to proactively identify potentially unlawful material within its datasets.