1 min read

Link: Reddit will block the Internet Archive

Reddit is restricting the Internet Archive from indexing most of its content to prevent AI companies from scraping data. They will limit the Archive's access to only the Reddit.com homepage, blocking the archiving of detailed posts, comments, or profiles.

The Internet Archive's mission includes preserving digital content, but Reddit argues that its full content shouldn't be archived in this manner. Reddit spokesperson Tim Rathschmidt noted the move is to protect user privacy and comply with platform policies.

Rathschmidt stated that the restrictions have begun to be implemented and that the Internet Archive was informed in advance. Reddit has previously expressed concerns about the potential for scraping content from the Archive.

Historically, Reddit has taken actions against AI data scraping by cutting off access to scraper tools unless they involve financial transactions. For instance, Google has paid for access to Reddit data for search and AI training.

Last year, Reddit altered its API usage policy, which led to the shutdown of several third-party apps and protests; these changes were attributed to abuses in AI model training. They also have an ongoing legal battle with Anthropic over similar scraping issues.

Update, August 11th: Mark Graham of the Wayback Machine stated they are in ongoing discussions with Reddit about the issue.

 #

--

Yoooo, this is a quick note on a link that made me go, WTF? Find all past links here.