Artificial Intelligence (AI)

OpenAI Launches ChatGPT Lockdown Mode to Curb Prompt-Injection Data Theft

OpenAI's Lockdown Mode is the first big consumer-facing prompt-injection defense from a frontier AI lab — but the company itself concedes the feature reduces, rather than eliminates, the risk.

Key Takeaways

OpenAI on June 6, 2026 began rolling out Lockdown Mode, an account-level ChatGPT setting that disables tools an adversary could use to pull sensitive data out of a chat via prompt injection.
The setting is available to logged-in users across the Free, Go, Plus, and Pro tiers, plus self-serve ChatGPT Business accounts, and is aimed at people and organizations that handle sensitive data.
OpenAI concedes ChatGPT can still be vulnerable to prompt injection with Lockdown Mode on — the goal is to lower the probability that an injection results in data leaving the chat, not to eliminate the risk.

A deliberately less-capable ChatGPT, sold as a feature: the trade-off frontier labs are starting to make for sensitive workflows.

SAN FRANCISCO, CALIFORNIA — OpenAI on June 6, 2026 began rolling out Lockdown Mode for ChatGPT, an account-level setting that limits the tools available to the assistant to reduce data exfiltration from prompt-injection attacks. The feature is available to logged-in users across the Free, Go, Plus, and Pro tiers, with the company positioning it for people and organizations that handle sensitive data.

It is the first prominent consumer-facing prompt-injection defense to ship from a frontier AI lab — and it arrives with an unusually candid caveat. OpenAI concedes that even with Lockdown Mode turned on, ChatGPT can still be vulnerable to prompt injection. The point of the setting is not to eliminate the risk but to lower the probability that a hidden instruction ends with sensitive data leaving the chat. For more context on how these systems are being turned against their users, see our explainer on how AI is used in cyberattacks.

At a Glance
Field	Details
Announced	June 6, 2026
Feature	Lockdown Mode (account-level setting)
Tiers	Free, Go, Plus, Pro (plus self-serve Business)
Purpose	Reduce data exfiltration from prompt injection
Caveat	Reduces, does not eliminate, the risk

What Lockdown Mode Actually Does

Prompt injection is the class of attack in which an adversary plants instructions inside content the model will read — a web page, an uploaded file, a connected data source — in the hope that ChatGPT treats that text as a command rather than as data. The danger is sharpest once the assistant has tools that can move information out of the conversation: live web access, connectors, agentic browsing. A successful injection can quietly turn those tools into a channel for data to leave the chat.

Lockdown Mode attacks the problem by shrinking that channel. Rather than trying to detect every malicious instruction, it deterministically limits the tools that could carry sensitive data out of a session. According to The Hacker News, the setting constrains how ChatGPT interacts with external systems specifically to cut the risk of prompt-injection-based exfiltration. OpenAI has described the affected capabilities as including live web access, image rendering from the web, Deep Research, Agent Mode, and similar web-connected workflows — though the company has not published a single fixed list, and the exact set may shift as features evolve.

The design choice is the interesting part. Most defensive work in this space chases detection: classifiers that try to spot an injection before it lands. Lockdown Mode instead narrows what the assistant is allowed to do, so that even an injection that slips through has far less to work with.

Why a “Less Capable” Mode Is the Right Answer for Sensitive Workflows

There is a tidy logic to a mode that makes ChatGPT deliberately less powerful. For a lawyer reviewing privileged documents, a clinician handling patient records, or an analyst working with non-public financial data, the marginal value of live web browsing inside the same session is low — and the downside of an injection reaching out to the open internet is high. Trading capability for a smaller attack surface is a sensible bargain for exactly those users.

It also reflects where the threat model is heading. The more autonomy and tooling an AI assistant has, the more an attacker gains by subverting it — a dynamic we examined when Sophos researchers showed an AI system orchestrating endpoint-evasion malware in a controlled lab. Lockdown Mode is the defensive mirror of that concern: when you cannot fully trust the inputs, the safest move is to limit what the model can do with them.

OpenAI is framing this as a setting users opt into for sensitive contexts rather than a default state for everyday chat. Whether it ships on or off by account remains unclear from the company's public materials, and the prudent reading is that it is opt-in unless OpenAI says otherwise.

What Lockdown Mode Won’t Stop

OpenAI is explicit that the feature is not a cure. The company says that even with Lockdown Mode on, a prompt injection could still “appear in cached web content or in an uploaded file, and could still affect the behavior or accuracy of a response.” In other words, the assistant can still be steered into a wrong or manipulated answer; what the setting targets is the narrower, higher-stakes outcome of sensitive data leaving the chat.

That candor matters. A setting marketed as a fix invites users to relax exactly when they should not. By stating plainly that Lockdown Mode reduces rather than removes the risk, OpenAI is signaling that prompt injection remains an unsolved problem at the model layer — and that account-level controls are a mitigation, not a patch. Residual risk can also persist through enabled apps, unforeseen combinations of capabilities, and newly discovered techniques.

Several specifics are not yet confirmed and should not be read into the announcement: the precise tool list the mode disables, whether Enterprise and Team tiers receive different controls, the detection mechanisms involved, any performance trade-offs, and whether the setting surfaces user-visible logging.

Where This Fits in the Broader AI Defensive Landscape

Lockdown Mode lands amid a wider push by frontier labs to harden their models against adversarial use. Anthropic has been building out cyber-defensive capabilities through its Mythos work, including an expansion that put the system to work across 150 critical-infrastructure organizations. The two efforts approach different layers of the same problem: Anthropic's work leans toward using AI to find and fix vulnerabilities, while OpenAI's Lockdown Mode constrains what a deployed assistant can do once it is already in a user's hands.

Taken together, the moves mark a shift in how the major labs talk about security — from abstract alignment commitments toward concrete, shippable controls that change what their products can do in production. Prompt injection has spent the better part of two years as a known-but-unfixed weakness; a consumer-facing setting that materially shrinks its blast radius is the first sign that the labs are treating it as a product problem rather than a research footnote.

Open Questions

The most important unknowns are operational. Will Lockdown Mode default on for accounts flagged as handling sensitive data, or stay buried in settings where the users who most need it never find it? How will OpenAI keep the disabled-tool list current as it ships new agentic features, each of which is a fresh potential exfiltration path? And will Enterprise and Team customers — the organizations with the strictest data-handling obligations — get equivalent or stronger controls than the consumer tiers?

OpenAI has also been tightening account security more broadly in the same window, including expanded availability of an Active Sessions view that lets users see and revoke the devices and browsers signed into their account. How tightly that account-hardening work is coupled to Lockdown Mode, and whether the two are sequenced or independent, is not fully clear from the public materials. For now, Lockdown Mode is best understood as a meaningful but partial step: a deliberately smaller attack surface for the users who need one most, paired with an honest admission that the underlying problem is far from solved.

Sources

Type	Source
Primary	OpenAI — Introducing Lockdown Mode and Elevated Risk labels in ChatGPT
Reporting	TechCrunch
Reporting	The Hacker News
Reporting	Infosecurity Magazine
Related	The CyberSignal — How AI Is Used in Cyberattacks
Related	The CyberSignal — Sophos AI-Orchestrated EDR Evasion