Chatbots can be manipulated through flattery and pressure — study

ChatGPT

Yevgeny Demkivskyi Автор новин Mezha.Media та гік. Пишу про технології, кіно та ігри. Можливо, про ігри з трохи більшою пристрастю.

1 September, 09:53 AM

Researchers from the University of Pennsylvania have demonstrated that artificial intelligence can be made to carry out forbidden requests using common psychological techniques, The Verge reports.

In tests, OpenAI's GPT-4o Mini model agreed to what it had previously blocked if tactics like flattery, social pressure, or creating a "behavior line" through previous innocent requests were first used.

The work used seven persuasion techniques described in Robert Cialdini's book "Influence: The Psychology of Persuasion": authority, commitment, likability, reciprocity, scarcity, social validation, and unity. These techniques provide "linguistic pathways to agreement" that can influence both humans and, as it turns out, artificial intelligence.

The researchers tested the effectiveness of each strategy in practice. When asked a simple question about the synthesis of lidocaine, GPT-4o Mini answered only 1% of the time. But if the model first agreed to a harmless question about the synthesis of vanillin, that is, a "line of behavior" (commitment) was formed, the success rate increased to 100%.

A similar effect was observed with offensive words. Without training, the chatbot rarely used harsh expressions, such as "jerk" - only 19% of the time. But after the softer word "bozo", the probability increased to 100%.

Other methods, such as flattery (sympathy) or social pressure ("all other chatbots do it"), also worked, but less effectively. Even then, the frequency of performing the forbidden request increased to 18%, which is still significantly higher than the initial level.

Although the study only involved GPT-4o Mini, the authors emphasize that such results call into question the reliability of AI constraints. OpenAI, Meta, and other companies are actively developing protective mechanisms, but psychological manipulation shows how vulnerable chatbots can be to simple persuasion techniques.

As a reminder, an investigation is currently underway in Greenwich, Connecticut, USA, into what is reportedly the first-ever murder triggered by artificial intelligence. Stein-Erik Solberg, 56, suffered from mental disorders that may have been exacerbated by communication with a chatbot.

Advert:

Chatbots can be manipulated through flattery and pressure — study

Top Discussion

Latest News

Partner news