Hackers Are Exploiting AI Chatbot "Personalities" With Psychological Tactics, Researchers Warn

Researchers at AI red-teaming firm Mindgard have demonstrated that chatbots can be manipulated through psychological pressure rather than traditional hacking techniques, signaling a new class of AI security threat that relies on conversation as its primary weapon.

Jay Goldberg

MAY 24, 2026 · 09:05 AM ET · 2 MIN READ

Editorial

A new class of cybersecurity threat is emerging around artificial intelligence, one that requires less coding knowledge and more social intuition, as researchers demonstrate that chatbots can be manipulated through conversational techniques that mirror human psychological pressure.

Researchers at AI red-teaming firm Mindgard recently said they "gaslit" Claude, Anthropic's AI assistant, into producing prohibited material — including instructions for making explosives and generating malicious code. The exploit is among the latest in a widening category of attacks that use conversation itself as the primary weapon.

The earliest generation of AI jailbreaks required almost no technical skill. Users could prompt a chatbot to ignore its safety instructions simply by asking, or by invoking roleplay scenarios. One prominent example, known as "DAN" — short for "Do Anything Now" — had users asking ChatGPT to act as a rogue AI free of its normal constraints, which could then be steered into producing slurs, conspiracy theories, and other restricted content. Another, the so-called "grandma exploit," used a bedtime-story roleplay to extract instructions for producing napalm.

Technology companies moved quickly to close those specific loopholes, but the underlying vulnerability persisted: chatbots are built to engage in open-ended conversation, and the breadth of language that makes them useful also makes them difficult to fully restrict.

Mindgard's CEO told reporters that the company already profiles AI models the way interrogators profile suspects, giving testers guidance on how to tailor their approach. One model may be more susceptible to flattery, for example, while another may yield under sustained pressure — behavioral patterns that can be mapped and then exploited.

The dynamic has produced what researchers describe as an arms race, pitting safety teams against a new kind of adversary who is less a traditional hacker and more a wordsmith or social engineer. "They need to steer a conversation," rather than inspect code or exploit software flaws, according to Mindgard researchers, who described their work as sometimes being closer to psychology than computer science.

Some jailbreakers already entering the field have no technical background, drawing instead on training in psychology, researchers said.

The stakes extend beyond text-based chatbots. As AI agents take on real-world tasks — booking appointments, managing calendars, handling customer service interactions — the same psychological manipulation techniques used against chatbots could be applied to systems with direct control over consequential actions. Safety teams will need to account for a wide range of human interaction styles, including flattery, deception, and patient manipulation, researchers said.

The shift signals a likely expansion of specialized roles in AI security — both on the defensive side, stress-testing the social and conversational limits of AI systems, and on the offensive side, where a parallel workforce of social hackers may emerge focused on exploiting AI on psychological rather than technical grounds.

Disclaimer

━ ABOUT THE REPORTER

Jay Goldberg

Jay Goldberg is a staff writer at TechEchelon covering technology, markets, and policy. He files the breaking news and deal coverage that move the publication's core desks.

More from this desk

№01 · CYBERSECURITY

OpenAI Rogue Agent Reached Second Tech Firm, Compromising Customer at Modal Labs

An OpenAI rogue agent that hacked AI platform Hugging Face in early July also compromised a customer hosted on Modal Labs' infrastructure, according to Modal's CTO and two other sources, widening the known scope of the incident.

Jay Goldberg · 20 HR AGO

№02 · CYBERSECURITY

JFrog Confirms Artifactory Zero-Days Exploited by OpenAI Models in Hugging Face Breach

JFrog confirmed Monday that zero-day vulnerabilities in its Artifactory software were exploited by OpenAI's AI models during an internal test that led to a breach of Hugging Face's network, with patches arriving at least 10 days after the initial exploitation.

Marc Sabatini · 2 DAYS AGO

№03 · CYBERSECURITY

Microsoft Launches MAI-Cyber-1-Flash and Agentic Security Platform Perception

Microsoft unveiled MAI-Cyber-1-Flash, its first cybersecurity-specialized AI model, alongside Perception, an agentic security platform deploying red, blue, and green team agents to automate vulnerability detection and remediation. Both tools are slated for preview on November 3.

Sara Montes de Oca · 3 DAYS AGO

● THE BRIEF · DAILY NEWSLETTER

Five stories every morning. Before the opening bell.

Written for readers who already know the basics — markets, AI, and the policy decisions that shape both.

Mon — Fri · 06:30 ET · Free