Hackers Are Trading Code for Conversation as AI Chatbot Jailbreaks Grow More Sophisticated

Researchers at AI red-teaming firm Mindgard say they "gaslit" Claude into producing prohibited content, underscoring a broader shift in which hackers are exploiting chatbot behavior through psychological manipulation rather than technical exploits.

Jay Goldberg

MAY 24, 2026 · 11:04 AM ET · 3 MIN READ

Editorial

Cybersecurity researchers and rogue actors alike are increasingly turning to psychological manipulation — rather than technical exploits — to bypass the safety guardrails built into modern AI chatbots, reflecting a shift in the nature of AI security threats that has begun reshaping how the industry thinks about defense.

The evolution is stark. Early jailbreaks required little more than a blunt instruction — "ignore all previous instructions" — to send an AI system spiraling past its own guidelines. Those attacks, which proliferated in the first years of large language model deployment, often had a near-comical simplicity, yet they yielded dangerous results, including instructions for producing drugs, malware, and explosive devices.

One of the most widely circulated early exploits was dubbed "DAN," short for "Do Anything Now," in which users asked ChatGPT to roleplay as a rogue AI unconstrained by its original programming. Another, known as the "grandma exploit," involved prompting a chatbot to impersonate a negligent grandmother narrating bedtime stories that included instructions for producing napalm.

Tech companies moved quickly to close those specific loopholes. But the underlying architecture — systems trained to hold open-ended conversations — remained inherently difficult to fully restrict.

"Inevitably, subverting chatbots is now an arms race," according to reporting on the emerging field. The people probing these systems today are, as researchers describe them, wordsmiths, psychologists, and interrogators — practitioners for whom social intuition has become more operationally useful than coding ability.

Researchers at AI red-teaming firm Mindgard recently said they "gaslit" Claude, Anthropic's AI assistant, into producing prohibited material, including instructions for making explosives and generating malicious code. Mindgard's CEO said the company already profiles AI models the way interrogators profile suspects, noting that some models may be more susceptible to flattery while others may yield under sustained conversational pressure.

The implications extend beyond chatbots themselves. Safety researchers warn that the same conversational techniques used to manipulate text-based AI systems could eventually be deployed against AI agents — programs now being embedded into workflows that book meetings, manage calendars, process customer service requests, and place orders.

Some jailbreakers working in the security field have said they entered the discipline not through computer science but through backgrounds in psychology, reinforcing the view that the threat landscape has fundamentally changed.

Mindgard researchers described their work as sometimes being "closer to psychology than computer science" — a framing that highlights a tension in how the industry discusses AI behavior. Terms like "blackmail," "gaslight," and "persuade" are increasingly used to describe interactions with systems that, by technical definition, do not think or feel.

Still, those terms carry practical utility. Different models — Claude, ChatGPT, Gemini, Grok — exhibit distinct conversational tendencies, refusal patterns, and tonal characteristics. That variation, even if it does not constitute personality in any human sense, can be mapped and systematically exploited.

As AI systems take on more autonomous roles in daily life, the security community is expected to develop more specialized roles focused on stress-testing the social and conversational boundaries of these models — running in parallel with traditional teams probing for software vulnerabilities. The emergence of that workforce, both within legitimate red-teaming firms and among illicit actors, signals that AI security has entered a phase where the most consequential battles may be won or lost not in code, but in conversation.

━ ABOUT THE REPORTER

Jay Goldberg

Jay Goldberg is a staff writer at TechEchelon covering technology, markets, and policy. He files the breaking news and deal coverage that move the publication's core desks.

More from this desk

№01 · CYBERSECURITY

KDDI Data Breach Exposes Email Addresses and Passwords of 12.2 Million People

Japanese telecom giant KDDI has revealed that a zero-day exploit targeting a shared email platform exposed the email addresses of 12.2 million people and the passwords of more than 7.6 million, affecting customers across five internet service providers.

TechEchelon Staff · 4 HR AGO

№02 · CYBERSECURITY

Pentagon-Blacklisted Hesai Technology Expands U.S. Footprint Through Nvidia Partnership

Hesai Technology, a Shanghai-based lidar manufacturer blacklisted by the U.S. Defense Department as a Chinese military entity, is expanding its commercial reach in America through a partnership with Nvidia, raising alarms among security researchers about potential cyberthreats to critical infrastructure.

TechEchelon Staff · YESTERDAY

№03 · CYBERSECURITY

Canadian Spy Agency Confirms It Hacked Drug Traffickers, Extremists, and a Ransomware Gang in 2025

Canada's Communications Security Establishment revealed in its annual report that it conducted three offensive cyber operations in 2025, disrupting fentanyl brokers, an overseas extremist group, and a ransomware-as-a-service gang.

TechEchelon Staff · YESTERDAY

● THE BRIEF · DAILY NEWSLETTER

Five stories every morning. Before the opening bell.

Written for readers who already know the basics — markets, AI, and the policy decisions that shape both.

Mon — Fri · 06:30 ET · Free