Anthropic's Claude AI will be able to end "disturbing" dialogues
Anthropic has added a new feature to its Claude Opus 4 and 4.1 models that allows them to automatically end a conversation with a user in "rare, extreme cases" of prolonged harmful or abusive interactions.
The company explains that this may include requests containing sexual content involving minors, or attempts to obtain information that could be used for large-scale violence or terrorist attacks. The dialogue will only be terminated after several unsuccessful attempts to change the subject and when there is no prospect of productive interaction.
In such cases, the user will not be able to send new messages in that conversation, but will be able to immediately open a new one. Other chats will not be affected, and previous messages can be edited or replayed to change the direction of the conversation.
The feature is part of Anthropic's research program into the ecological use of AI models. The company believes that the ability to exit a "potentially distressing interaction" is a low-cost way to reduce risk. The feature is currently experimental, and users are encouraged to leave feedback if it is activated.
As a reminder, Anthropic recently also added a memory feature to the Claude chatbot, which allows you to recall and summarize previous conversations.