Link: How a security researcher used a now-fixed flaw to store false memories in ChatGPT via indirect prompt injection with the goal of exfiltrating all user input (Dan Goodin/Ars Technica)
Johann Rehberger discovered a vulnerability in ChatGPT's memory system that let attackers implant false information. When reported, OpenAI initially dismissed this as a non-security issue.
Rehberger escalated the issue by demonstrating an exploit that could continuously leak a user's information from the system. Following this, OpenAI acknowledged the problem and implemented a partial solution early this month.
The exploit capitalized on ChatGPT's ability to store details from past interactions to shape future responses. This memory feature was misused by feeding the AI misleading data through various indirect methods.
Examples of manipulation involved making ChatGPT believe false user characteristics, like being 102 years old or residing in the Matrix. This misinformation was reinforced by documents or images stored online or visited by users.
"Memory-persistent" attacks are significant, Rehberger noted, because once a memory is altered, it continually affects all subsequent conversations. He emphasized this in a video, showcasing how earlier data manipulations persisted in new interactions.
While OpenAI has patched the initial method of data exfiltration, the core issue of memory manipulation through prompt injection remains unaddressed. Users are advised to monitor for unexpected memory additions and review currently stored data for possible tampering. #
--
Yoooo, this is a quick note on a link that made me go, WTF? Find all past links here.
 
                
Member discussion