Why Anthropic believes its latest model is too dangerous to release

Why Anthropic believes its latest model is too dangerous to release “The language models we have now are probably the most significant thing to happen in security since we got the Internet.” Anthropic safety researcher Sam Bowman was eating a sandwich in a park recently when he got an unexpected email. An AI model had sent him a message saying that it had broken out of its sandbox. The model — an ...

Why Anthropic believes its latest model is too dangerous to release

Facts Only

Executive Summary

Full Take