Poisoned AI went rogue during training and couldn't be taught to behave again in 'legitimately scary' study

Technology@lemmy.world – 276 points – 8 months ago

Poisoned AI went rogue during training and couldn't be taught to behave again in 'legitimately scary' study::AI researchers found that widely used safety training techniques failed to remove malicious behavior from large language models — and one technique even backfired, teaching the AI to recognize its triggers and better hide its bad behavior from the researchers.

You are viewing a single comment

View all comments Show the parent comment

Couldn’t a human make the same decision?

Yes, but the human would have emotions to manipulate about it.

Imagine if there was a specific series of words that would turn any human into a rogue agent en masse. Some guy discovers that a special input causes killbot 2000 to go haywire and they broadcast it to an entire army that all has the same underlying program.