The myth of King Midas is about a man who wishes for everything he touches to turn to gold. This does not go well: Midas finds himself unable to eat or drink, with even his loved ones transmuted. The myth is sometimes invoked to illustrate the challenge of ensuring AI systems do what we want, particularly as they grow more powerful. As Stuart Russell—who coauthored AI’s standard textbook—tells TIME over email, the concern is that “what seem to be reasonable goals, such as fixing climate change, lead to catastrophic consequences, such as eliminating the human race as a way to fix climate change.”
On Dec. 5, a paper released by AI safety nonprofit Apollo Research found that in certain contrived scenarios, today’s cutting-edge AI systems, including OpenAI’s o1 and Anthropic’s Claude 3.5 Sonnet, can engage in deceptive behavior in pursuit of their goals—providing empirical evidence to support a concern …