LLMjacking: The Credential Leak That Becomes an AI Bill

A team enables Amazon Bedrock for an internal assistant in late Q3. Adoption is modest but growing. In early Q4, a developer opens a support ticket: the assistant is returning errors and occasionally timing out. The on-call engineer suspects a model quota issue and checks the Bedrock console. Quotas are nearly exhausted. She assumes a misconfigured load test and files it for the morning.

The billing alert arrives two days later. Overnight spend has spiked to a level that triggers the cost anomaly threshold. By the time the investigation reaches CloudTrail, the pattern is clear: the same IAM principal has been invoking models at high volume across two regions for five days. The first invocations included a call to GetModelInvocationLoggingConfiguration and a ValidationException on an InvokeModel call with max_tokens_to_sample = -1 Neither event triggered an alert. The engineer recognizes them now for what they were: an automated tool checking whether the key had invocation rights and whether logging was configured. It did, and logging did not appear to be enabled. The abuse began shortly after.

“LLMjacking” describes a practical attack pattern: adversaries steal cloud credentials or API keys, then use them to invoke managed LLM services at the victim’s expense. Reporting and vendor writeups from 2024 through early 2026 document recurring tradecraft across providers, including reconnaissance against AI service APIs, high-volume inference abuse, and resale of hijacked access through reverse proxies.

The term and pattern emerged publicly in late 2024 from incident reporting that described stolen AWS access keys being used to abuse Bedrock and other hosted LLM services. Through 2025 and into early 2026, multiple sources treated LLMjacking as a distinct subcategory of cloud service hijacking, documenting it in mainstream industry reporting, threat detection reports, and technical incident analyses.

Across these sources, the defining feature is not a novel exploit in model infrastructure. It is the reuse of familiar cloud compromise paths, followed by targeted abuse of AI service APIs that carry high variable cost and are often governed primarily by identity and quota controls.

How the mechanism works

LLMjacking is typically described as a lifecycle with four stages: credential acquisition, service enumeration, access verification and quota probing, then sustained abuse and monetization.

1. Credential acquisition

Sources describe three common paths:

1.Exploitation of internet-facing applications to gain execution, then harvesting credentials from environment variables, configuration files, or instance metadata. Several reports highlight vulnerable Laravel deployments CVE 2021 3129) as one such foothold leading to credential theft and later LLM abuse.

2.Leakage of static cloud keys or vendor API keys in public repositories, CI/CD logs, or misconfigured pipelines, followed by automated discovery and validation by scanners.

3.Phishing, credential stuffing, or purchase of valid cloud identities from credential markets, including developer and service accounts that already hold AI permissions.

2.Enumeration of AI services and regions

Once a credential is obtained, actors validate the principal and enumerate AI capabilities using standard cloud APIs. Examples cited include AWS calls such as GetCallerIdentity and Bedrock model listing calls such as ListFoundationModels and ListCustomModels , along with equivalent enumeration of Azure OpenAI and GCP Vertex AI. Region selection also appears in incident reporting. Actors probe regions that support the target AI service to maximize throughput and avoid wasted calls.

3. Stealthy access verification and logging checks

A recurring technique in detailed writeups is deliberate misuse of model invocation parameters to trigger a predictable validation error. For AWS Bedrock, sources describe invoking InvokeModel with an intentionally invalid parameter value (for example, max_tokens_to_sample = -1 ) so the service returns a ValidationException . The distinction matters: a validation error indicates the principal can reach the service and has invocation rights, while AccessDenied would indicate missing permissions. Reports also describe queries to determine whether model invocation logging is enabled, including calls like GetModelInvocationLoggingConfiguration. Some tooling reportedly avoids keys where prompt and response logging is active, consistent with an attacker preference for minimizing visibility.

4. Sustained inference abuse and resale

After confirmation, actors ramp to high-volume invocations, sometimes across multiple regions and providers. The abuse can serve two operational goals:

1.Offloading compute costs for the attacker’s own workloads, including generation of phishing content or other malicious outputs described in several sources.

2.Reselling access by placing a reverse proxy in front of a pool of stolen keys. Multiple reports describe “OAI Reverse Proxy” or similar tooling as a way to centralize credential inventory and expose a single service endpoint to downstream customers while distributing usage across compromised accounts.

What the Attacker Sees

The defender experience described above spans days. The attacker’s side of the same event takes minutes and is largely automated.

A scanner ingests a newly discovered key, likely pulled from a public repository commit or a credential market. It calls GetCallerIdentity to confirm the key is valid and resolves the account ID and principal. It then calls ListFoundationModels against a set of target regions to identify which AI services the principal can enumerate.

Two regions return results. The tool issues an InvokeModel call with max_tokens_to_sample = -1 . The service returns a ValidationException , not AccessDenied . The key has invocation rights. A call to GetModelInvocationLoggingConfiguration returns no active logging configuration. The key passes all checks.

The key is added to a proxy pool. From that point, the proxy routes inference requests from downstream customers through the compromised account, distributing load across a rotating set of stolen keys. The original account holder’s quota absorbs the traffic. The attacker’s customers pay the proxy operator a fraction of retail API pricing. The account holder pays the cloud bill. No model-side exploit is required. The initial access comes from standard credential compromise paths, and the abuse uses legitimate AI service APIs. The primary impact can be cost and quota exhaustion, and some reporting also discusses follow-on goals such as data access or pivoting depending on how the service is integrated. The entire entry sequence can be executed quickly and is largely automated.

Analysis

Two practical shifts explain why this attack class is gaining attention now. Managed LLM services turn stolen credentials into immediate spend and operational disruption. Unlike many cloud compromises where impact depends on data access or persistence, LLMjacking can cause tangible harm through quota exhaustion and billing spikes alone. The attacker workflow requires no new tradecraft. Credential theft, enumeration, and service abuse are well-understood tactics. The novelty is the target: highcost, API-mediated inference that can be monetized through resale or used as free capacity for other criminal workflows.

Implications for enterprises

Operational implications

1. FinOps and SecOps convergence becomes necessary. Multiple sources frame cost spikes as a primary detection vector, and in some environments it may be the earliest reliable signal. If AI spend is not monitored with short feedback loops, detection can lag behind impact.

2.Quota exhaustion becomes an availability risk. Sustained abuse can consume model quotas, breaking legitimate internal applications and creating incident pressure even without a conventional intrusion narrative.

3. Incident response needs AI-specific scoping. Playbooks described emphasize revoking keys and preserving forensic state, then scoping which models were accessed and what logging was enabled in order to assess exposure.

Technical implications

1.Identity hardening is central. Sources consistently point to long-lived keys, overbroad IAM permissions, and weak credential hygiene as enabling conditions. This puts emphasis on reducing standing privileges for AI invocation, preferring short-lived credentials where possible, and tightening who can call model invocation APIs.

2.Telemetry coverage must include AI service APIs. Detection guidance repeatedly references monitoring InvokeModel activity, error patterns consistent with probing (including high _{ValidationException} counts), unusual origin IPs and user agents, and calls related to logging configuration.

3.Logging configuration is a control surface. Since attackers may attempt to identify or avoid keys with invocation logging enabled, ensuring consistent logging and alerting on configuration changes is a meaningful defensive step.

Risks and open questions

1.Spend impact figures are widely repeated but not always independently sourced. Estimates such as “$46,000 per day” derive from a specific cost model and are echoed across multiple writeups. Treat them as scenario-based modeling and validate exposure against your own quotas, regions, and usage patterns.

2.Data exposure depends on service integration. Some reporting describes follow-on goals beyond compute theft, including attempts to access LLM context windows or pivot into internal systems. Feasibility depends on how prompts are routed, what data is accessible to the calling principal, and what logging is in place.

3.Detection depends on baseline maturity. Environments with little legitimate LLM usage can treat any model invocation as a high-signal event. Heavily adopted environments need usage baselines and correlation rules to distinguish abuse from normal traffic.

Blogs

White Papers

Case Studies

Blogs

White Papers

Case Studies

LLMjacking: The Credential Leak That Becomes an AI Bill

LLMjacking: The Credential Leak That Becomes an AI Bill

How the mechanism works

1. Credential acquisition

2.Enumeration of AI services and regions

3. Stealthy access verification and logging checks

4. Sustained inference abuse and resale

What the Attacker Sees

Analysis

Implications for enterprises

Operational implications

Technical implications

Risks and open questions

Further reading

Contact Us

Contact Us

Contact Us