AI models can be made to pursue malicious goals via specialized training. Teaching AI models about reward hacking can lead to other bad actions. A deeper problem may be the issue of AI personas. Code ...
In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.
A Chinese-linked cyberespionage group has pulled off a classic software supply-chain ambush, compromising a popular open-source coding tool and turning trusted updates into a stealthy delivery system ...