Українська правда

Reddit restricts Internet Archive access to most of the site

Reddit restricts Internet Archive access to most of the site
0

Reddit has noticed that several artificial intelligence companies are pulling data from the site through the Internet Archive’s Wayback Machine. Because Reddit doesn’t allow its content to be used for AI training without specific consent, the company has decided to restrict the Wayback Machine’s access to most of its content, The Verge reports.

The Wayback Machine, which previously archived various posts and data from Reddit, will now only be able to access the platform's main page. This means that data collectors will only see a list of the most popular posts of the day. The Internet Archive will no longer be able to store pages with post details, comments, or user profiles.

"Internet Archive provides a service to the open web, but we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine... Until they’re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we’re limiting some of their access to Reddit data to protect redditors," Reddit spokesperson Tim Ratschmidt told The Verge.

Reddit has long struggled with data collectors taking information from the platform without permission. In May 2024, the company first signed a deal with OpenAI, which allowed the developer of ChatGPT to use the platform’s content to train AI. Shortly thereafter, the company hid its content from all search engines except Google. Reddit representatives also stated that AI companies that want to train their models on posts from the platform must pay or face a lawsuit, which Anthropic has already received.

Share:
Посилання скопійовано
Advert:
Advert: