The person who coined the term “prompt injection” has the same gripe, because the original term genuinely did mean an attack using untrusted user input, a la SQL injection. But it’s been conflated with jailbreak attacks in general, muddying the term.
Example of a bona fide prompt injection: white text in the background of a resume PDF, attacking a job application portal that uses LLMs to filter applicants. No privilege escalation is involved to give the candidate top marks on their resume screening.
By that definition this is a prompt injection then, its adding a “hidden” prompt that is obscured from the human in order to change the behavior of the AI to do something else malicious.
The person who coined the term “prompt injection” has the same gripe, because the original term genuinely did mean an attack using untrusted user input, a la SQL injection. But it’s been conflated with jailbreak attacks in general, muddying the term.
Example of a bona fide prompt injection: white text in the background of a resume PDF, attacking a job application portal that uses LLMs to filter applicants. No privilege escalation is involved to give the candidate top marks on their resume screening.
Whereas a non-prompt injection jailbreak would be bypassing a safety filter, such as how Morse code might get past the filter and allow a user to request other people’s cryptocurrency be transfered away. This is more akin to finding a poorly-secured, public facing API and then exploiting it.
By that definition this is a prompt injection then, its adding a “hidden” prompt that is obscured from the human in order to change the behavior of the AI to do something else malicious.