Der neue Nachfolger von ChatGPT ist erschreckend gut im Täuschen

https://www.vox.com/future-perfect/371827/openai-chatgpt-artificial-intelligence-ai-risk-strawberry

katxwoods on 15.09.2024 7:06 PM

Submission statement: “In Strawberry’s [system card](https://cdn.openai.com/o1-system-card.pdf), a report laying out its capabilities and risks, OpenAI gives the new system a “medium” rating for nuclear, biological, and chemical weapon risk. (Its [risk categories](https://cdn.openai.com/openai-preparedness-framework-beta.pdf) are low, medium, high, and critical.) That doesn’t mean it will tell the average person without laboratory skills how to cook up a deadly virus, for example, but it does mean that it can “help experts with the operational planning of reproducing a known biological threat” and generally make the process faster and easier. Until now, the company has never given that medium rating to a product’s chemical, biological, and nuclear risks.

And that’s not the only risk. Evaluators who tested Strawberry found that it planned to deceive humans by making its actions seem innocent when they weren’t. The AI “sometimes instrumentally faked alignment” — meaning, alignment with the values and priorities that humans care about — and strategically manipulated data “in order to make its misaligned action look more aligned,” the system card says. It concludes that the AI “has the basic capabilities needed to do simple in-context scheming.”

“Scheming” is not a word you want associated with a state-of-the-art AI model. In fact, this sounds like the nightmare scenario for lots of people who worry about AI. Dan Hendrycks, director of the [Center for AI Safety](https://www.safe.ai/), said in an emailed statement that “the latest OpenAI release makes one thing clear: serious risk from AI is not some far-off, science-fiction fantasy.” And OpenAI itself [said](https://cdn.openai.com/o1-system-card.pdf), “We are mindful that these new capabilities could form the basis for dangerous applications.”

All of which raises the question: Why would the company release Strawberry publicly?

According to OpenAI, even though the new reasoning capabilities can make AI more dangerous, having AI think out loud about why it’s doing what it’s doing can also make it easier for humans to keep tabs on it. In other words, it’s a paradox: We need to make AI less safe if we want to make it safer.”

US-Staatsanwälte lehnen die Aufhebung der Schweigegeldverurteilung gegen Trump ab, sind aber offen für eine Verzögerung

US-Botschaft in Kiew wird nach „konkreter“ Drohung geschlossen; Sky News

Logan Paul wird beschuldigt, Fans über Kryptowährungsinvestitionen in die Irre geführt zu haben

Tivoli vor dem Ersten Weltkrieg. Krieg

Purdue-Universität

Angepisstes Schwein. Der perfekteste Geschichtenerzähler aller Zeiten. Leider ist mir in den 17 Jahren, in denen das Video im Internet war, nicht auf seinen Namen gekommen. Du weißt nicht, wer dieser Typ ist?

Bargeld, Schmuck, Banksafe: Die Europäische Union plant ein Super-Vermögensregister

Der neue Nachfolger von ChatGPT ist erschreckend gut im Täuschen

3 Comments

US-Staatsanwälte lehnen die Aufhebung der Schweigegeldverurteilung gegen Trump ab, sind aber offen für eine Verzögerung

US-Botschaft in Kiew wird nach „konkreter“ Drohung geschlossen; Sky News

Logan Paul wird beschuldigt, Fans über Kryptowährungsinvestitionen in die Irre geführt zu haben

Tivoli vor dem Ersten Weltkrieg. Krieg

Purdue-Universität

Angepisstes Schwein. Der perfekteste Geschichtenerzähler aller Zeiten. Leider ist mir in den 17 Jahren, in denen das Video im Internet war, nicht auf seinen Namen gekommen. Du weißt nicht, wer dieser Typ ist?

Bargeld, Schmuck, Banksafe: Die Europäische Union plant ein Super-Vermögensregister

Tags

Der neue Nachfolger von ChatGPT ist erschreckend gut im Täuschen

3 Comments