Hacker News Clone

Frontier Models are Capable of In-context Scheming

by trott on 12/12/2024, 10:31:30 PM with 1 comments

by abrichr on 12/13/2024, 8:07:05 PM
https://arxiv.org/abs/2412.04984
> Our findings demonstrate that frontier models now possess capabilities for basic in-context scheming [covertly pursuing misaligned goals], making the potential of AI agents to engage in scheming behavior a concrete rather than theoretical concern.