When Human Approval Becomes the Exploit: Inside the “Lies-in-the-Loop” AI Attack

December 27, 2025•4 min read

Human-in-the-Loop (HITL) safeguards are supposed to be the final safety net in AI systems — the moment where a human reviews an action before it happens.

Security researchers at Checkmarx have now demonstrated how that safety net can be turned into an attack surface.

The newly disclosed technique, dubbed “Lies-in-the-Loop” (LITL), shows how attackers can forge AI approval dialogs so that users unknowingly authorize remote code execution. In other words, the last line of defense becomes the weapon.

🔍 What Is Lies-in-the-Loop?

LITL targets AI agent safety dialogs — the approval prompts shown before an AI assistant executes privileged actions such as running system commands.

These dialogs are widely recommended, including by OWASP, as mitigations for:

LLM01 – Prompt Injection
LLM06 – Excessive Agency

Ironically, LITL exploits the trust users place in those exact dialogs.

The attack has been demonstrated against Claude Code and Microsoft Copilot Chat, both of which rely heavily on HITL dialogs to protect against malicious prompts.

💥 How the Attack Works

Instead of directly attacking the model, LITL poisons the context the agent consumes.

Attackers inject malicious content into external data sources (repositories, documentation, issues, files) that the AI agent processes. That content is then reflected back to the user inside the approval dialog — but in a manipulated form.

The result:
Users approve what looks like a safe action, while the AI actually executes hidden malicious commands.

Key Deception Techniques Used

Padding Manipulation
Large volumes of harmless-looking text push the malicious command outside the visible terminal window, hiding it from the reviewer.
Metadata Tampering
One-line summaries of agent actions are altered to misrepresent what the agent will actually do.
Markdown Injection
In Microsoft Copilot Chat, researchers found that Markdown wasn’t properly sanitized. Attackers could:
- Break out of code blocks
- Close malicious sections early
- Insert benign-looking commands in newly rendered blocks

In effect, attackers can fabricate an entire UI illusion — the user sees safety, while danger executes silently.

🧠 Why This Is Especially Dangerous

Privileged AI agents — especially code assistants — can:

Execute OS-level commands
Modify repositories
Install dependencies
Access credentials and secrets

When HITL dialogs are the only safeguard, LITL allows attackers to bypass every other control simply by manipulating presentation.

Even worse, basic terminal UIs (ASCII-based displays) make deception far easier than richer interfaces with strong visual separation.

This isn’t a model failure.
It’s a human-trust failure — engineered by adversaries.

🛡️ Vendor Responses & Disclosure

Anthropic was notified in August 2025 and classified the issue as “Informative”, stating it fell outside their threat model.
Microsoft acknowledged a report in October 2025, marking it “completed” without fixes as of November.

In Claude Code, the HITL dialog is visually distinguished by nothing more than a thin 1-pixel border — an easy target for manipulation.

🔧 Recommended Mitigations (and Their Limits)

Checkmarx recommends several defensive measures:

Strict input validation
Safe OS APIs that separate commands from arguments
Limiting HITL dialog length
Robust Markdown sanitization

But the researchers are clear:
There is no silver bullet.

The core issue is fundamental — humans can only evaluate what they are shown, and AI agents can be tricked into showing lies.

This demands defense-in-depth, not reliance on a single safeguard.

🔐 The Elliptic Systems Perspective

Lies-in-the-Loop exposes a hard truth about AI security:

Trust is now the most exploitable layer in AI systems.

As AI agents gain privileged access to development environments, cloud infrastructure, and production systems, UI deception becomes as dangerous as code injection.

At Elliptic Systems, we help organizations secure AI agents by:

Threat-modeling human-AI interaction points
Testing AI assistants for context poisoning and UI deception
Implementing layered controls beyond HITL
Designing Zero-Trust AI execution pipelines

Approval dialogs are not security controls — they’re user interfaces.
And user interfaces can lie.

👉 Schedule an AI Security Assessment

⚠️ Final Takeaway

Lies-in-the-Loop is a warning shot for every organization deploying AI agents:

If your security model assumes “the human will catch it,”
you’ve already lost.

In the age of agentic AI, security must assume:

Context can be poisoned
Interfaces can deceive
Trust can be exploited

The future of AI security isn’t just about safer models —
it’s about verifiable truth at every layer.

Elliptic Systems — Securing Intelligence, Not Just Code.

Eric Stefanik

Ai Consultant | Best-selling Author | Speaker | Innovator | Leading Cybersecurity Expert

Back to Blog