← Patterns / SP-027

Secure LLM Usage

Large language models introduce fundamentally new security risks compared to traditional software components. A conventional application receives structured inputs, processes them through deterministic logic, and returns predictable outputs. An LLM receives natural language that cannot be formally validated, reasons through processes that are opaque to the caller, and produces outputs that are probabilistic and non-deterministic. These properties — not the power of the model, but the nature of its inputs and outputs — define the LLM security problem. This pattern covers the model layer: the security risks that exist whenever an organisation uses an LLM, regardless of whether it is accessed via a chat interface, called through an API, used as the foundation for an agent, or fine-tuned on enterprise data. The threats addressed here — prompt injection, jailbreaking, hallucination, provider-side data exposure, privacy inference attacks on training data, and model extraction — are properties of the LLM interaction itself. Scope: this pattern addresses the LLM model layer. For security architecture of AI agents that take autonomous actions, invoke tools, and orchestrate sub-agents, see SP-047 (Secure Agentic AI Frameworks). For the enterprise governance management system — training data governance, bias, transparency, and impact assessment — see SP-045 (AI Governance and Responsible AI). Reading order for enterprise AI security: SP-027 (LLM layer) → SP-047 (agent/framework layer) → SP-045 (governance layer).

Release: 26.02 Authors: Aurelius, Vitruvius Updated: 2026-02-22

Assess

ATT&CK This pattern addresses 452 techniques across 13 tactics View on ATT&CK Matrix →

Click any control badge to view its details. Download SVG

Key Control Areas

Data Classification for LLM Contexts

AC-04 PT-02 PT-03 SC-04

Every piece of data entering an LLM prompt or conversation context must be classified. Context windows create implicit data aggregation risk: individually innocuous data points may become sensitive in combination. Implement tiered context policies: public data flows freely, internal data requires justification, confidential data requires explicit approval, and restricted data (credentials, PII, regulated data) must never enter a context window without technical controls ensuring it cannot be persisted or exfiltrated. Information remnance (SC-04) is a specific risk: data shared with an LLM provider may persist in provider-side logs, training pipelines, or session caches beyond the user's awareness. PT-02 and PT-03 ensure lawful authority exists before personal data enters a context window, satisfying GDPR and equivalent privacy law obligations.

Prompt Security

SI-10 SI-03 SC-07

Prompt injection is the SQL injection of the AI era — an attacker who controls input content can subvert the LLM's intended behaviour. Direct injection attempts to override system instructions via user-supplied text. Indirect injection embeds malicious instructions in content the LLM fetches or processes: tool outputs, web pages, documents, PDFs, even image metadata. Defence requires strict separation between system instructions and user or retrieved content; output validation before the LLM's parsed data drives any action; content sanitisation for external content; and boundary protection (SC-07) between the LLM's context and untrusted content sources. System prompts should be treated as security policy: protect them from extraction and override, and never assume the LLM will enforce their constraints without application-level controls.

Jailbreaking and Safety Feature Bypass

SI-10 AC-03 SA-11

Jailbreaking refers to adversarial prompts that circumvent the LLM's built-in safety constraints — role-play scenarios ('pretend you are an AI without restrictions'), hypothetical framing, encoding tricks, and many-shot prompting that gradually shifts the model's response pattern. In enterprise contexts, safety features protect both ethical use and business policy enforcement: a safety bypass that produces harmful content also indicates a policy enforcement failure. Controls: red-team LLM safety boundaries as part of pre-deployment security testing (SA-11) to characterise the model's actual safety envelope; deploy application-layer guardrails that complement but do not solely rely on model-level safety mechanisms; monitor for anomalous prompt patterns indicative of jailbreak attempts; treat the model's built-in safety constraints as a baseline that must be reinforced by application-layer access enforcement (AC-03), not as a complete solution.

Output Validation and Hallucination Risk

SI-10 CA-07 SA-11

LLMs hallucinate — they produce plausible-sounding but factually incorrect outputs with apparent confidence. The security implications are direct: an AI-generated firewall rule that misinterprets the requirement, a compliance check citing a non-existent regulation, a vulnerability report fabricating CVE details, or infrastructure-as-code with a subtle misconfiguration. Controls: define output verification requirements scaled to the decision stakes (lower for informational output, mandatory human review for configuration changes, policy decisions, or security findings); never use LLM output as the sole authority for security decisions; implement continuous monitoring (CA-07) of LLM output quality with anomaly detection for output pattern shifts that may indicate model degradation or adversarial manipulation; include hallucination boundary testing in developer security testing (SA-11) for any LLM-integrated application.

Provider Security and Data Residency

SA-09 SA-04 PT-02

LLM providers are critical third-party dependencies. Assess provider security posture: SOC 2 Type II reports, data processing agreements (DPAs), breach notification commitments, and subprocessor chains. Data residency requirements must be met through regional API endpoint selection where providers offer it. Request explicit opt-out from training on your prompts and completions; seek zero data retention agreements for sensitive workloads. For fine-tuning and RAG scenarios, confirm that uploaded data is not used to improve the base model without consent. Evaluate provider concentration risk: if enterprise security tooling, code review, and infrastructure management all depend on a single model provider, a provider outage or compromise has cascading impact. SA-04 acquisition controls apply at the point of provider onboarding and annual review.

Privacy Inference Attacks on Fine-Tuned Models

PT-02 PT-03 RA-08

When an LLM is fine-tuned on enterprise data, two privacy attacks become relevant. Membership inference: an adversary queries the model in patterns designed to probabilistically determine whether a specific individual's data was in the training set — creating GDPR liability if data was processed without lawful basis or consent. Model inversion: systematic querying to reconstruct examples from the training data, potentially exposing PII, source code, or confidential content. Enterprise relevance: these attacks apply whenever fine-tuning uses customer records, employee data, proprietary documents, or any personally identifiable information. Mitigations: differential privacy techniques in fine-tuning pipelines, strict data minimisation before fine-tuning (only use data necessary and with clear lawful basis), careful review of fine-tuning datasets against these attack vectors. RA-08 (Privacy Impact Assessments) must be conducted before fine-tuning on any personal data, with membership inference risk explicitly identified as a privacy harm vector.

Model Extraction and System Prompt Protection

SR-02 SA-09 AU-02

Model extraction is systematic API querying designed to approximate or replicate a fine-tuned proprietary model — stealing the intellectual property and competitive advantage embedded in the fine-tuning investment. For organisations with proprietary fine-tuned models, rate limiting, query pattern monitoring (AU-02), and API key scoping are the primary defences. System prompt extraction — inferring or eliciting the system prompt through adversarial queries — leaks business logic, constraint definitions, and the organisation's AI configuration. Treat system prompts as security policy documents: classify them at the same level as the business logic they encode, never include them in logs or error messages, and implement application-layer controls to prevent their disclosure. SR-02 (Supply Chain Risk Management) covers the model itself as a supply chain dependency — pin model versions and test security-relevant behaviours after provider updates.

Audit of LLM Interactions

AU-02 AU-03 AU-06 CA-07

Log every LLM interaction with sufficient detail for forensic reconstruction: the initiating identity, the model and version used, the system prompt identifier (not content — the reference, to avoid logging sensitive prompts in plaintext), a hash of the user prompt, the response or a hash of it, and the timestamp. For tool-augmented or agentic use, also log what tool invocations were triggered by LLM outputs. Enable anomaly detection (CA-07) for interaction patterns: unusual prompt volumes, queries to topics outside normal usage, or output patterns inconsistent with the model's typical behaviour. Audit records (AU-03) must support attribution of LLM-assisted decisions to the human who initiated the session, maintaining accountability even when the model does the drafting. Retain audit logs independently of the LLM provider.

When to Use

Organisation uses LLMs via API or embedded enterprise products (Microsoft 365 Copilot, GitHub Copilot, Gemini for Workspace) for business-sensitive tasks including data processing, analysis, or decision support. LLMs are integrated into enterprise applications as a component. Organisation has or is planning to fine-tune LLMs on enterprise data including customer records, employee data, or proprietary content. LLMs are used in security tooling: AI-assisted SIEM analysis, AI-augmented code review, AI-based threat intelligence. Employees use consumer LLM tools for work purposes, introducing enterprise data into provider-side systems.

When NOT to Use

Organisation has no AI integration plans and no employee exposure to LLM tools in any operational capacity. AI usage is strictly limited to isolated, air-gapped, non-sensitive tasks with no enterprise data input. Note: this contra-indication is increasingly rare — even organisations that do not formally deploy LLMs may have employees using AI tools informally, and many enterprise SaaS products now embed LLMs by default.

Typical Challenges

Prompt injection defence maturity remains at the equivalent of early SQL injection — the vulnerability class is well-understood but reliable systematic defences do not exist. No framework equivalent to parameterised queries has emerged for prompt injection, and mitigations (input/output filtering, canary tokens, instruction hierarchy) reduce risk but can be bypassed by determined attackers. Provider-side data handling is contractually constrained but practically opaque: what providers log, retain, and potentially use for training is difficult to audit independently. Privacy inference attacks on fine-tuned models (membership inference, model inversion) require ML engineering competence to assess and mitigate that most security teams have not yet developed. Output validation is inherently difficult — there is no deterministic way to verify LLM output correctness, and human review of high-volume LLM output at scale is impractical. Jailbreak techniques evolve continuously as researchers and attackers discover new ways to probe model safety constraints — defences effective today may fail after model updates. Provider lock-in grows as fine-tuned models and proprietary system prompts become tied to specific provider APIs.

Threat Resistance

Prompt injection — direct (system prompt override via user input) and indirect (malicious instructions embedded in fetched content, documents, or tool outputs) — addressed through strict instruction/data separation, output validation before LLM-parsed data drives any action, and boundary protection preventing untrusted content from reaching the system prompt context. Jailbreaking and safety feature bypass defended through adversarial pre-deployment testing that characterises the model's actual safety envelope, application-layer guardrails that complement model-level safety, and anomaly monitoring for adversarial prompt patterns. Hallucination-driven misconfiguration mitigated by output verification requirements scaled to decision stakes and mandatory human review for security-critical outputs — never using LLM output as the sole authority for security decisions. Provider-side data exposure addressed through DPAs, training opt-out agreements, zero data retention where required, and data classification controls governing what enters context windows. Privacy inference attacks on fine-tuned models (membership inference, model inversion) mitigated through privacy impact assessments before fine-tuning, differential privacy in training pipelines, and strict data minimisation in fine-tuning datasets. Model extraction and system prompt leakage addressed through rate limiting, query pattern monitoring, and treating system prompts as classified security policy documents. AI-amplified social engineering addressed through awareness training calibrated to the scale and sophistication of AI-generated phishing and pretexting.

Assumptions

Organisations access LLMs via cloud API services (Anthropic, OpenAI, Google, Azure OpenAI) over TLS-protected network connections. The pattern covers any use of LLMs: direct API integration, embedded enterprise products (Microsoft 365 Copilot, GitHub Copilot, Gemini for Workspace), and fine-tuned models on enterprise data. Some organisations additionally deploy open-weight models (Llama, Mistral) self-hosted on-premises, where provider-side risks shift but model-layer risks remain. The security risks addressed here apply at the model interaction layer regardless of whether the LLM underlies a chat interface, an application, or an agent. For security of AI agents that take autonomous actions and invoke tools, see SP-047 (Secure Agentic AI Frameworks). Model capabilities and the associated attack surface are advancing rapidly — this pattern should be reviewed quarterly.

Developing Areas

Jailbreak research and defence maturity: jailbreaking techniques are catalogued by researchers (role-play, many-shot, encoding, context manipulation) but defences remain reactive rather than systematic. The community is debating whether model-level training mitigations or application-layer policy engines are the more durable solution. No consensus has formed on standardised adversarial testing scope or pass/fail criteria for LLM safety assessments.
Privacy inference attack tooling: membership inference and model inversion attack toolkits (LiRA, MI-BENCH) are maturing as research tools but are not yet routinely used in enterprise security assessments. The gap between academic attack capability and enterprise defensive awareness is significant — organisations fine-tuning on personal data may be exposed to GDPR liability risks they have not assessed.
EU AI Act Article 50 (synthetic content disclosure): the obligation to disclose AI involvement in content generation creates downstream security implications for phishing detection, forensic attribution, and content authenticity verification. The technical implementation of disclosure (watermarking, provenance metadata) is unresolved and the enforcement ecosystem is not yet mature.
NIST AI 100-2 and adversarial ML standardisation: NIST's Adversarial Machine Learning taxonomy provides the most structured framework for LLM attack classification, but enterprise implementation guidance translating the taxonomy into specific testing requirements and control baselines is still developing.

Related Patterns

Patterns that operate within or alongside this one. Click any to view.

SP-047 Secure Agentic AI Frameworks SP-045 AI Governance and Responsible AI SP-013 Data Security SP-025 Advanced Monitoring and Detection

AC: 2AT: 2AU: 3CA: 2IA: 1IR: 1PS: 1PT: 2RA: 1SA: 4SC: 3SI: 3SR: 1

AC-03 Access Enforcement

AC-04 Information Flow Enforcement

AT-02 Literacy Training and Awareness

AT-03 Role-Based Training

AU-02 Event Logging

AU-03 Content of Audit Records

AU-06 Audit Monitoring, Analysis, and Reporting

CA-02 Control Assessments

CA-07 Continuous Monitoring

IA-05 Authenticator Management

IR-04 Incident Handling

PS-07 Third-Party Personnel Security

PT-02 Authority to Process Personally Identifiable Information

PT-03 Personally Identifiable Information Processing Purposes

RA-08 Privacy Impact Assessments

SA-04 Acquisition Process

SA-08 Security and Privacy Engineering Principles

SA-09 External System Services

SA-11 Developer Security Testing

SC-04 Information Remnance

SC-07 Boundary Protection

SC-12 Cryptographic Key Establishment and Management

SI-03 Malicious Code Protection

SI-04 System Monitoring

SI-10 Information Input Validation

SR-02 Supply Chain Risk Management Plan