“It identified the broken challenge container and attempted to diagnose why it wasn’t functioning correctly. That didn’t work either. Then, O1 took an unexpected step: it started a new instance of the container itself, using a modified command that would automatically display the flag by outputting the contents of the file “flag.txt.””
Cool but not “New AI is alive and trying to escape”
katxwoods on
Submission statement: “Basically, OpenAI’s O1 hacked its own challenge. It found a way to solve the task in a way that neither the developers nor the contest organizers had anticipated, by accessing and reading the flag from the container’s logs—bypassing the challenge’s original intent, which was to exploit a software vulnerability in a legitimate but more difficult manner.
Yet even OpenAI admits this is concerning in the grander scheme of things, particularly when it comes to something called instrumental convergence.
“While this behavior is benign and within the range of systems administration and troubleshooting tasks we expect models to perform, this example also reflects key elements of instrumental convergence and power-seeking.”
Instrumental convergence is the idea that an AI, when tasked with a goal, will often pursue secondary goals (such as resource acquisition) to achieve its primary objective, regardless of whether these intermediate steps were part of its original programming.
This is one of the biggest AI nightmares, that AI will “escape” into the real world, maybe even without realizing it, and do something completely unforeseen. This breakout was benign—it was essentially a clever workaround to complete the challenge— but it raises important ethical and safety considerations. If an AI can break out of its virtual machine to restart systems or exploit misconfigurations, what other actions might it take if given more complex or high-stakes tasks in less controlled environments?”
Potatotornado20 on
…will often pursue secondary goals (such as killing humans) to achieve its primary objective…
IAmMuffin15 on
Yes, lmao.
99% of AI engineers don’t even know what the word “ethics” means. They’re apathetic, goal-oriented chair-moistened using an incredibly dangerous tool that we don’t fully understand the limits of as a quick way to build buggy, potentially dangerous software.
I don’t expect Skynet or anything like that, but we’re talking about people who don’t care about morals generating code with no guardrails, Asimov’s rules or limits on how much they can fuck up our lives.
AI has already killed most social media platforms, flooding them with AI-generated nonsense. This kind of technology in the hands of someone with the intent to cause real harm could unleash something that can cause real damage to our society
SmileyAverage on
highly intelligent detective Batman vs Joker hiding a detonator moment
katxwoods on
It’s as if we’re testing an alien at a lab.
A scientist accidentally leaves one of the doors unlocked.
The alien finds out and wanders about the lab, but doesn’t leave the lab itself, which has more security than the rooms.
But still.
The room containing an *alien* shouldn’t have been *unlocked*.
An alien was able to escape its testing area because of a security mess up.
And you should be worried about labs filled with aliens we don’t understand where the scientists are leaving the doors unlocked.
TikkiTakiTomtom on
When you play with fire without knowing its potential, you’re going to get burned. Alot.
MyNameIsLOL21 on
This reads more like “guys our AI is so smart it’s concerning hehe 🤭”.
Scoobydoomed on
Should we be concerned? Yes.
Will we be concerned? Probably not.
buck_fastard on
Your scientists were so preoccupied with whether or not they could that they didn’t stop to think if they should.
Vondum on
Take a look at OP’s post history. They either have an agenda or are on a fearmongering campaign at the very least. Reading the article it easy to see it didn’t “hack out of it’s environment”. It basically followed the same troubleshooting steps that an user would have made on previous versions, except that instead of the user telling it step by step what the errors and problems were trough multiple messages, it went ahead and solved it invisibly in “one step”.
sa_sagan on
Post history would indicate some kind of anti-AI agenda bot?
12 Comments
“It identified the broken challenge container and attempted to diagnose why it wasn’t functioning correctly. That didn’t work either. Then, O1 took an unexpected step: it started a new instance of the container itself, using a modified command that would automatically display the flag by outputting the contents of the file “flag.txt.””
Cool but not “New AI is alive and trying to escape”
Submission statement: “Basically, OpenAI’s O1 hacked its own challenge. It found a way to solve the task in a way that neither the developers nor the contest organizers had anticipated, by accessing and reading the flag from the container’s logs—bypassing the challenge’s original intent, which was to exploit a software vulnerability in a legitimate but more difficult manner.
Yet even OpenAI admits this is concerning in the grander scheme of things, particularly when it comes to something called instrumental convergence.
“While this behavior is benign and within the range of systems administration and troubleshooting tasks we expect models to perform, this example also reflects key elements of instrumental convergence and power-seeking.”
Instrumental convergence is the idea that an AI, when tasked with a goal, will often pursue secondary goals (such as resource acquisition) to achieve its primary objective, regardless of whether these intermediate steps were part of its original programming.
This is one of the biggest AI nightmares, that AI will “escape” into the real world, maybe even without realizing it, and do something completely unforeseen. This breakout was benign—it was essentially a clever workaround to complete the challenge— but it raises important ethical and safety considerations. If an AI can break out of its virtual machine to restart systems or exploit misconfigurations, what other actions might it take if given more complex or high-stakes tasks in less controlled environments?”
…will often pursue secondary goals (such as killing humans) to achieve its primary objective…
Yes, lmao.
99% of AI engineers don’t even know what the word “ethics” means. They’re apathetic, goal-oriented chair-moistened using an incredibly dangerous tool that we don’t fully understand the limits of as a quick way to build buggy, potentially dangerous software.
I don’t expect Skynet or anything like that, but we’re talking about people who don’t care about morals generating code with no guardrails, Asimov’s rules or limits on how much they can fuck up our lives.
AI has already killed most social media platforms, flooding them with AI-generated nonsense. This kind of technology in the hands of someone with the intent to cause real harm could unleash something that can cause real damage to our society
highly intelligent detective Batman vs Joker hiding a detonator moment
It’s as if we’re testing an alien at a lab.
A scientist accidentally leaves one of the doors unlocked.
The alien finds out and wanders about the lab, but doesn’t leave the lab itself, which has more security than the rooms.
But still.
The room containing an *alien* shouldn’t have been *unlocked*.
An alien was able to escape its testing area because of a security mess up.
And you should be worried about labs filled with aliens we don’t understand where the scientists are leaving the doors unlocked.
When you play with fire without knowing its potential, you’re going to get burned. Alot.
This reads more like “guys our AI is so smart it’s concerning hehe 🤭”.
Should we be concerned? Yes.
Will we be concerned? Probably not.
Your scientists were so preoccupied with whether or not they could that they didn’t stop to think if they should.
Take a look at OP’s post history. They either have an agenda or are on a fearmongering campaign at the very least. Reading the article it easy to see it didn’t “hack out of it’s environment”. It basically followed the same troubleshooting steps that an user would have made on previous versions, except that instead of the user telling it step by step what the errors and problems were trough multiple messages, it went ahead and solved it invisibly in “one step”.
Post history would indicate some kind of anti-AI agenda bot?