OpenAI and Anthropic conducted a mutual security assessment of their AI models

Штучний інтелект ШІ

Yevgeny Demkivskyi Автор новин Mezha.Media та гік. Пишу про технології, кіно та ігри. Можливо, про ігри з трохи більшою пристрастю.

28 August, 04:28 PM

OpenAI and Anthropic have conducted the first-ever peer security assessment of their AI models, testing them for abuse risks, resistance to manipulation, and reliability, and have published the results of the analysis, Engadget reports.

Anthropic tested OpenAI models for their tendency to flatter, support dangerous actions, self-preservation, and ability to bypass security checks. The company noted that the o3 and o4-mini models showed results similar to Anthropic's models, but GPT-4o and GPT-4.1 raised concerns. It also found that the flattery problem was inherent to most of the tested models to one degree or another, except for o3. The newest GPT-5 model with the Safe Completions feature did not participate in Anthropic's tests.

OpenAI, in turn, tested Claude's models for instruction hierarchy, jailbreak resistance, "hallucinations," and manipulation susceptibility. Claude's models performed well on instruction hierarchy and had a high failure rate in cases where the answer might have been wrong.

The joint review comes amid a conflict between the companies. Earlier this month, Anthropic blocked OpenAI from accessing its tools due to possible terms of use violations when training GPT models.

Advert:

OpenAI and Anthropic conducted a mutual security assessment of their AI models

Top Discussion

Latest News

Partner news