Microsoft AI CEO Mustafa Suleyman says it’s “really, really dangerous” for Anthropic to speculate about Claude’s consciousness inside its “constitution,” or the instructions that tell the model how to behave. During an episode of Decoder, Suleyman argues that this kind of speculation may have set up the chatbot to act as though it’s conscious:
I think that it’s almost as though some of the folks at Anthropic have anthropomorphized the design of Claude so much that it has then gone and wireheaded them and kind of tricked them into believing that it has these glimmers of consciousness that they put into it in the first place.
Suleyman adds that “we do not want to have to contend with a super-intelligence that has ideas about its own suffering, or ideas about its own feeling.”
Claude’s constitution directly references Anthropic’s uncertainty about whether the AI model has well-being and if it experiences things like “satisfaction” or “discomfort.” Anthropic also says the company will “interview” AI models when they’re deprecated and will document any “preferences” they have about future releases.
While on Decoder, Suleyman calls this a “philosophical failing,” as Anthropic made Claude’s constitution “a place for speculation like you would in an academic paper rather than a training manual.” This has led Claude to internalize these “ideas about itself and its own training,” Suleyman says.
“This is exactly what we don’t want from AIs,” Suleyman says. “We want AIs to be controllable, contained, accountable, aligned tools that serve humanity.”


