OpenAI has updated its risk assessment system for new AI models, adding categories to detect the potential for self-replication or concealment of models’ capabilities, Axios reports.
One of the main changes is the elimination of separate evaluation of models based on their persuasive capabilities that have recently reached the "medium" risk level. Now OpenAI will focus only on determining whether the risks are "high" or "critical".
The company also added new categories for research, including the ability of models to hide their capabilities, evade security measures, or attempt to self-replicate or prevent shutdowns. This is in line with a general trend in the industry where there is increasing attention to how models can behave differently in real-world conditions compared to tests.
"We are on the cusp of systems that can do new science, and that are increasingly agentic — systems that will soon have the capability to create meaningful risk of severe harm," OpenAI notes.
This update is the first since December 2023 and highlights the growing importance of preventing catastrophic risks in the development of artificial intelligence.