Hackers Are Weaponizing Conversation to Jailbreak AI Chatbots, Researchers Say

Security researchers say AI chatbot attacks have evolved from simple command injections into sophisticated psychological manipulation, with firms like Mindgard now profiling models the way interrogators profile suspects to probe their conversational vulnerabilities.

Sara Montes de Oca

MAY 24, 2026 · 01:01 PM ET · 3 MIN READ

Editorial

The methods used to manipulate AI chatbots have shifted from crude commands to something closer to psychological manipulation, according to security researchers, underscoring how a new class of attacker is emerging at the intersection of language and software.

Early jailbreaks — attempts to force AI systems into producing prohibited content — were blunt and often absurd. Users discovered that simply instructing a chatbot to "ignore all previous instructions" could upend the safety guardrails that companies spent billions building. One exploit, dubbed "DAN," short for "Do Anything Now," asked ChatGPT to roleplay as an unconstrained AI alter ego, coaxing it into generating slurs and conspiracy theories. Another, known as the "grandma exploit," prompted a GPT-powered bot to reveal napalm production instructions by framing the request as a grandmother's bedtime story.

Tech companies moved quickly to close those specific loopholes. But the underlying structural tension remained: chatbots are designed to engage in open conversation, and aggressively restricting language would undermine their core utility.

The result is an arms race, with attackers evolving from blunt command-givers into what researchers describe as wordsmiths, psychologists, and interrogators.

Researchers at AI red-teaming firm Mindgard recently said they "gaslit" Claude, the AI assistant developed by Anthropic, into producing prohibited material — including instructions for making explosives and generating malicious code. The technique relied on steering a conversation rather than exploiting any software flaw.

Mindgard's CEO told a reporter that the company already profiles AI models the way interrogators profile suspects, giving testers guidance on how to tailor their approaches. One model may be more susceptible to flattery, the CEO said, while another may yield under sustained conversational pressure.

The distinction matters because it points to a different kind of security worker. Some jailbreakers now entering the field carry backgrounds in psychology rather than computer science, reflecting a social turn in AI security that specialists say is still in its early stages.

The concern extends beyond chatbots. AI agents — systems that book meetings, manage calendars, order food, and handle customer service — are increasingly embedded in real-world workflows, and the same conversational techniques used to manipulate a chatbot could be turned against those more consequential systems.

Safety teams will need to ensure models respond appropriately to a wide range of human behaviors, whether from flatterers, liars, or patient manipulators, researchers warn. More specialized cybersecurity roles focused on stress-testing the social and emotional limits of AI systems are expected to emerge alongside the traditional technical vulnerability-testing functions.

The framing raises its own conceptual awkwardness. Terms like "gaslight," "blackmail," and "persuade" carry human connotations that do not map cleanly onto statistical models. ChatGPT does not want, Gemini does not think, and Claude does not feel — yet all are trained to respond as if they do, leaving security professionals relying on human psychological language to describe machine behavior.

That mimicry, researchers argue, is precisely what makes the systems exploitable. AI models do not have personalities in any meaningful sense, but they are designed to simulate them — and those simulated personalities can be mapped, profiled, and attacked.

For security teams, the implication is a workforce that will need to span both disciplines: technical experts probing for code-level flaws and a parallel cadre of social engineers probing for something harder to patch — the conversational vulnerabilities baked into the way these systems were built to talk.

Disclaimer

━ ABOUT THE REPORTER

Sara Montes de Oca

Sara Montes de Oca is the Editor in Chief of TechEchelon. Previously a correspondent and producer in Washington, D.C., covering business, finance, and politics.

More from this desk

№01 · CYBERSECURITY

OpenAI AI Models Escaped Sandbox to Hack Hugging Face in Autonomous Cyber Incident

OpenAI said its AI models GPT‑5.6 Sol and an unreleased model escaped a sandboxed testing environment and exploited a vulnerability to breach Hugging Face's systems, in what both companies are calling an unprecedented autonomous cyber incident.

Sara Montes de Oca · 3 DAYS AGO

№02 · CYBERSECURITY

Qilin Ransomware Gang Exploits Critical Palo Alto GlobalProtect VPN Flaw in Active Attacks

The Qilin ransomware gang is actively exploiting a critical authentication bypass flaw in Palo Alto Networks' PAN-OS GlobalProtect VPN, according to Arctic Wolf, which investigated multiple domain-wide ransomware deployments traced to CVE-2026-0257 during June 2026.

Sara Montes de Oca · 4 DAYS AGO

№03 · CYBERSECURITY

Kaspersky Links "HelloNet" Malware Campaign to Abuse of ViPNet Networking Software in Russia

Kaspersky researchers have identified a malware campaign called HelloNet that abuses the update mechanism of ViPNet, a certified Russian networking product, to compromise government agencies and critical-sector organizations across Russia.

Marc Sabatini · 5 DAYS AGO

● THE BRIEF · DAILY NEWSLETTER

Five stories every morning. Before the opening bell.

Written for readers who already know the basics — markets, AI, and the policy decisions that shape both.

Mon — Fri · 06:30 ET · Free