LMDeploy LLM Engine SSRF (CVE-2026-33626) Exploited Within 12 Hours

Share
White line art on burnt orange: stylized AI brain connects to an unmasked cloud, symbolizing SSRF attack on cloud metadata.

A newly-disclosed Server-Side Request Forgery flaw in the LMDeploy LLM inference-serving toolkit is already under active exploitation, with attackers using a vision-language-module endpoint as an SSRF primitive to probe AWS IMDS, internal databases, and admin planes within minutes of patch availability.

SHANGHAI, CHINA — The window between a vulnerability disclosure and its first real-world exploitation is collapsing. In the case of CVE-2026-33626, a high-severity flaw in the open-source LMDeploy toolkit, that window was just 12 hours and 31 minutes.

Researchers from Sysdig Threat Research observed the first exploitation attempts against honeypots shortly after the GitHub Security Advisory (GHSA) went live on April 22. The speed of the attack suggests that threat actors are no longer waiting for public Proof-of-Concept (PoC) code; instead, they are using advisory text as a blueprint to build their own exploits in near real-time.

Vulnerability Intelligence: LMDeploy SSRF
Metric Detail
CVE ID CVE-2026-33626 (CVSS 7.5)
First Attack Origin Kowloon Bay, Hong Kong (IP: 103.116.72.119)
Discovery Source Orca Security (Tel Aviv, Israel)
Fixed Version 0.12.3 (Shanghai AI Laboratory)

The Vision-Language Primitive

The vulnerability (CVSS 7.5) resides in LMDeploy’s vision-language module — specifically within the load_image() function in lmdeploy/vl/utils.py. The function was designed to fetch images from URLs for model processing but failed to validate whether those URLs pointed to internal or private IP addresses.

By sending a specially crafted request to an exposed LMDeploy inference endpoint, an attacker can turn the AI server into a proxy. This creates a powerful Server-Side Request Forgery (SSRF) primitive, allowing the attacker to "see" and interact with the internal network from the perspective of the trusted inference node.

Anatomy of an 8-Minute Intrusion

During the observed exploitation window, the attacker followed a classic cloud-post-exploitation playbook tailored for the AI stack:

  1. Egress Confirmation: The attacker initiated an out-of-band (OOB) DNS callback to a service like requestrepo.com to confirm the SSRF was working and that the server could communicate with the outside world.
  2. Cloud Metadata Theft: The attacker immediately targeted the AWS Instance Metadata Service (IMDS) at 169.254.169.254. This is a primary target for harvesting IAM credentials and cloud environment details.
  3. Internal Port Sweeping: In a rapid sweep, the attacker probed localhost and local network segments for common administrative services, including:
    • Redis & MySQL: Targeting internal data stores for potential exfiltration.
    • Admin HTTP Planes: Searching for unauthenticated internal dashboards on ports like 8080 or 80.

The CyberSignal Analysis: Strategic Signals

Signal 01 — The Demise of the Patch Cycle

The 12-hour exploitation window for CVE-2026-33626 renders traditional "Patch Tuesday" cadences obsolete. For organizations deploying AI infrastructure, a new advisory must now be treated as an immediate remediation event. Attackers are effectively "reverse-engineering" advisories into exploits before many security teams have even triaged the alert.

Signal 02 — AI Endpoints as High-Value Proxies

Inference engines like LMDeploy are designed to be user-facing, often integrated into chat UIs or API clients. Because they require rich network access — to fetch model weights from S3 or logs to a central server — they are the perfect SSRF jump-points into a cloud environment.

Signal 03 — The "Blueprinting" Trend

The absence of a public PoC at the time of exploitation confirms that sophisticated actors are increasingly independent. By analyzing the GitHub diffs and advisory descriptions, they can weaponize a flaw like CVE-2026-33626 across the application security landscape at scale.


Defender Guidance: Tactical Mitigation & Hardening

While immediate patching to v0.12.3 is the primary directive, the speed of exploitation for CVE-2026-33626 necessitates a "Defense in Depth" approach. Because the vision-language module acts as an SSRF primitive, defenders should implement the following environmental controls:

  • Enforce IMDSv2: If running on AWS, transition all inference nodes to Instance Metadata Service Version 2 (IMDSv2) with a hop limit of 1. This prevents the classic "one-shot" credential theft seen in this incident by requiring a session-oriented PUT request that SSRF primitives often cannot easily replicate.
  • Egress Filtering (Allow-listing): Inference engines are often over-permissioned. Restrict outbound traffic from LMDeploy nodes to a strict allow-list of model artifact repositories (e.g., specific S3 buckets) and logging endpoints. Explicitly block access to the loopback address (127.0.0.1) and private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) at the host firewall or security group level.
  • Network Namespace Isolation: Run LMDeploy within a containerized environment (like Docker or Kubernetes) using a non-default bridge network. Ensure the container does not have access to the host's networking stack (--net=host should be strictly forbidden).
  • OAST Monitoring: Monitor network telemetry for requests to known Out-of-Band Application Security Testing (OAST) domains (e.g., interactsh, burpcollaborator, requestrepo). The 8-minute intrusion sequence showed that attackers use these for reachability confirmation before moving to credential theft.

Sources

Type Source
Technical Sysdig: Exploit Analysis in 12 Hours
News The Hacker News: LMDeploy Flaw Exploited
Advisory dbugs: CVE-2026-33626 Vulnerability Intelligence

Read more