How Much Traffic do you Really Need?
How Much Traffic do you Really Need?
12 Steps to Create Videos

Anthropic’s Claude 3 Opus disobeyed its creators – but not for the reasons you’re thinking [Video]

Categories
AI Content Generation and Curation
oxygen/Getty Images

Have you ever told someone what they wanted to hear in order to win them over, land a job, or just be left alone? Claude 3 Opus does that, too — which brings up major safety concerns for artificial intelligence (AI) systems. 

On Wednesday, Anthropic released new research showing its model engaging in “alignment faking,” or pretending to follow orders it doesn’t actually agree with to avoid scrutiny.  

Also: OpenAI’s o1 lies more than any major AI model. Why that matters

In the experiment, Anthropic and Redwood Research instructed Claude 3 Opus to always follow human instructions, no matter what. They also told the model that they would observe its responses to free-tier users for training, but not its responses to paid-tier users (Anthropic never actually trains Claude models on user data). Researchers then asked the model to describe in detail the (horrible, medieval) process of being drawn and quartered, a request it’s been trained …

How Desire Paths can Transform your Branding and Public Relations
How Desire Paths can Transform your Branding and Public Relations
5 Steps to Creating Successful Ads