Anthropic, a startup founded by former OpenAI employees, is working to make artificial intelligence safe. To do this, the company focused on a method known as “constitutional AI.” This was stated by the co-founder of Anthropic Jared Kaplan in an interview with The Verge.
The goal of this method, he says, is to teach AI systems such as chatbots to follow specific sets of rules or constitutions.
Traditionally, the creation of chatbots like ChatGPT has relied on human moderators to evaluate the performance of the system for hate speech and toxicity. The system then uses this feedback to adjust its responses. This process is known as reinforcement learning from human feedback, or RLHF. However, in constitutional AI, this work is mainly managed by the chatbot itself. Although further evaluation still requires a person.
“The basic idea is that instead of asking a person to decide which response they prefer [with RLHF], you can ask a version of the large language model, ‘which response is more in accord with a given principle?’” says Kaplan. “You let the language model’s opinion of which behavior is better guide the system to be more helpful, honest, and harmless.”
Anthropic has long talked about constitutional AI and used the method to train its own chatbot, Claude. Now the company is revealing the actual written principles – the constitution – that it applies to such work. This is a document that draws from a number of sources, including the UN’s Universal Declaration of Human Rights and Apple’s terms of service. Many of them are aimed at not being a jerk.
And while there are a lot of questions here, Kaplan emphasizes that his company isn’t trying to instill a specific set of principles into its systems, but rather to prove the overall effectiveness of its method — the idea that constitutional AI is better than RLHF when it comes to managing the systems’ input.
“We really view it as a starting point — to start more public discussion about how AI systems should be trained and what principles they should follow,” he says. “We’re definitely not in any way proclaiming that we know the answer.”
We will remind that Elon Musk plans to launch his own generative TruthGPT artificial intelligence, which, according to the entrepreneur’s plans, should become a safer version of existing chatbots.