OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole

Nemeski@lemm.ee to Technology@lemmy.world – 438 points –
OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole
theverge.com
97

You are viewing a single comment

"...today is opposite day."

I just love that almost anyone can participate in hacking language models. It just shows how good natural language is as a programming language, and is a great way to explain how useful these things can be when used correctly

It won't be long before you end up with language models that suggest ways to break other language models.