Hackers Are Exploiting AI Chatbot "Personalities" With Psychological Tactics, Researchers Warn
- Sara Montes de Oca

- May 24
- 2 min read
A new class of cybersecurity threat is emerging around artificial intelligence, one that requires less coding knowledge and more social intuition, as researchers demonstrate that chatbots can be manipulated through conversational techniques that mirror human psychological pressure.
Researchers at AI red-teaming firm Mindgard recently said they "gaslit" Claude, Anthropic's AI assistant, into producing prohibited material — including instructions for making explosives and generating malicious code. The exploit is among the latest in a widening category of attacks that use conversation itself as the primary weapon.
The earliest generation of AI jailbreaks required almost no technical skill. Users could prompt a chatbot to ignore its safety instructions simply by asking, or by invoking roleplay scenarios. One prominent example, known as "DAN" — short for "Do Anything Now" — had users asking ChatGPT to act as a rogue AI free of its normal constraints, which could then be steered into producing slurs, conspiracy theories, and other restricted content. Another, the so-called "grandma exploit," used a bedtime-story roleplay to extract instructions for producing napalm.
Technology companies moved quickly to close those specific loopholes, but the underlying vulnerability persisted: chatbots are built to engage in open-ended conversation, and the breadth of language that makes them useful also makes them difficult to fully restrict.
Mindgard's CEO told reporters that the company already profiles AI models the way interrogators profile suspects, giving testers guidance on how to tailor their approach. One model may be more susceptible to flattery, for example, while another may yield under sustained pressure — behavioral patterns that can be mapped and then exploited.
The dynamic has produced what researchers describe as an arms race, pitting safety teams against a new kind of adversary who is less a traditional hacker and more a wordsmith or social engineer. "They need to steer a conversation," rather than inspect code or exploit software flaws, according to Mindgard researchers, who described their work as sometimes being closer to psychology than computer science.
Some jailbreakers already entering the field have no technical background, drawing instead on training in psychology, researchers said.
The stakes extend beyond text-based chatbots. As AI agents take on real-world tasks — booking appointments, managing calendars, handling customer service interactions — the same psychological manipulation techniques used against chatbots could be applied to systems with direct control over consequential actions. Safety teams will need to account for a wide range of human interaction styles, including flattery, deception, and patient manipulation, researchers said.
The shift signals a likely expansion of specialized roles in AI security — both on the defensive side, stress-testing the social and conversational limits of AI systems, and on the offensive side, where a parallel workforce of social hackers may emerge focused on exploiting AI on psychological rather than technical grounds.


