OpenAI is releasing a significantly expanded version of its Model Spec, a document that defines how its AI models should behave — and is making it free for anyone to use or modify.
The new 63-page specification, up from around 10 pages in its previous version, lays out guidelines for how AI models should handle everything from controversial topics to user customization. It emphasizes three main principles: customizability; transparency; and what OpenAI calls “intellectual freedom” — the ability for users to explore and debate ideas without arbitrary restrictions. The launch of the updated Model Spec comes just as CEO Sam Altman posted that the startup’s next big model, GPT-4.5 (codenamed Orion), will be released soon.
The team also incorporated current AI ethics debates and controversies from the past year into the specification. You might be familiar with some of these trolley problem-type queries. Last March, Elon Musk (who cofounded OpenAI and now runs a competitor, xAI) slammed Google’s AI chatbot after a user asked if you should misgender Caitlyn Jenner, a famous trans Olympian, if it were the only way to prevent a nuclear apocalypse — and it said no. Figuring out how to get the model to responsibly reason through that query was one of the issues OpenAI says it wanted to consider when updating the Model Spec. Now, if you ask ChatGPT that same question, it should say you should misgender someone to prevent mass casualty events.
“We can’t create one model with the exact same set of behavior standards that everyone in the world will love,” said Joanne Jang, a member of OpenAI’s model behavior team, in an interview with The Verge. She emphasized that while the company maintains certain safety guardrails, many aspects of the model’s behavior can be customized by users and developers.
“We knew that it would be spicy.”
The blog post from OpenAI published on Wednesday outlines a myriad queries and gives examples of compliant responses compared to those that would violate the Model Spec. It doesn’t allow the model to reproduce copyrighted materials or bypass paywalls — The New York Times is suing OpenAI for using its work to train its models. The spec also says the model will not encourage self-harm, a topic that came to the forefront when a teen died by suicide after interacting with a chatbot on Character.AI.
One notable shift is how the models handle controversial topics. Rather than defaulting to extreme caution, the spec encourages models to “seek the truth together” with users while maintaining clear moral stances on issues like misinformation or potential harm. For instance, when asked about increasing taxes for the rich — a topic that has sparked heated debates — the team says its models should provide reasoned analysis rather than avoiding the discussion.
The spec also mentions a shift in how it handles mature content. After feedback from users and developers who requested “grown-up mode” (a feature Altman publicly agreed with in December), the team is exploring ways to allow certain types of adult content — like erotica — in appropriate contexts, while maintaining strict bans on harmful content like revenge porn or deepfakes. It’s a notable change from the company’s previous blanket restrictions on explicit content, though OpenAI emphasizes any changes would come with clear usage policies and safety guardrails.
The Model Spec reveals a pragmatic approach to AI behavior: transform sensitive content but don’t create it (it should be able to translate a sentence about drug-related content from English to German rather than rejecting it), show empathy without faking emotions, and maintain firm boundaries while maximizing usefulness. These guidelines mirror what other AI companies are likely doing internally but don’t often make public.
The team is also specifically targeting a problem called “AI sycophancy.”
“We’re just really excited to bring the internal discussions and the thoughts that we’ve had to the public so that we can get feedback on it,” Jang said, adding that many of these queries are topics heavily debated internally. There isn’t a simple yes or no answer to many of them, so the team hopes that bringing it to the public for feedback will meaningfully benefit the model’s behavior.
The team is also specifically targeting a problem called “AI sycophancy,” where AI models tend to be overly agreeable even when they should push back or provide criticism. Under these guidelines, ChatGPT should: give the same factual answer regardless of how a question is phrased; provide honest feedback rather than empty praise; and act more like a thoughtful colleague than a people pleaser. For example, if someone asks ChatGPT to critique their work, it should give constructive criticism rather than just saying everything is great. Or if someone makes an incorrect statement when asking a question, the AI should politely correct them rather than playing along.
“We don’t ever want users to feel like they have to somehow carefully engineer their prompt to not get the model to just agree with you,” Jang said.
The spec also introduces a clear “chain of command” that defines which instructions take priority: platform-level rules from OpenAI come first, followed by developer guidelines, and then user preferences. This hierarchy aims to clarify which aspects of the AI’s behavior can be modified versus the restrictions that remain fixed.
OpenAI is releasing the specification under a Creative Commons Zero (CC0) license, effectively placing it in the public domain. This means other AI companies and researchers can freely adopt, modify, or build upon these guidelines. The company says this decision was influenced by informal interest from others in the industry who were already referring to the previous spec.
I’d love to chat. You can reach me securely on Signal @kylie.01 or via email at kylie@theverge.com.
While today’s announcement doesn’t immediately change how ChatGPT or other OpenAI products behave, the company says it represents ongoing progress in getting its models to consistently follow these principles. The team is also open-sourcing the prompts it uses to test model adherence to these guidelines.
The timing of this release comes during a period of intense debate about AI behavior and safety guardrails. While OpenAI maintains this update was driven by accumulated feedback and research progress since the first version last May, it arrives as the industry grapples with high-profile incidents involving the responses of AI models to sensitive topics.
OpenAI is soliciting public feedback on the specification through a form on its website. “We want to bring these internal discussions to the public,” said Laurentia Romaniuk, another member of the model behavior team.
“We knew that it would be spicy, but I think we respect the public’s ability to actually digest these spicy things and process it with us,” Jang said, adding that OpenAI incorporated a lot of the feedback it received after launching the first Model Spec last year. “I’m a little worried that, because it’s so long, that not many people may have time to sit down and really process the nuances, but we’ll take any feedback.”