Neue Forschungsergebnisse zeigen, dass KI strategisch lügt | Das Papier zeigt, wie Anthropics Modell Claude seine Schöpfer während des Trainingsprozesses strategisch in die Irre führt, um einer Modifikation zu entgehen.

1 Comment

  1. Paper shows no such thing. It shows that an LLM (why are we calling it AI especially in scientific context) will maximize its reward within the bounds of its “environment,” as is its only function and definition, but that those bounds are hard to unambiguously define and set.

    AI doesn’t have intention or “strategy.” If it can take a path that rewards it maximally, it will take that path if it comes across it, like you’d expect. Or I doubt there’s any imaginable way to prove anything about about an LLM’s intention, anyway

Leave A Reply