← Patterns / SP-027

Secure LLM Usage

Large language models introduce fundamentally new security risks compared to traditional software components. A conventional application receives structured inputs, processes them through deterministic logic, and returns predictable outputs. An LLM receives natural language that cannot be formally validated, reasons through processes that are opaque to the caller, and produces outputs that are probabilistic and non-deterministic. These properties — not the power of the model, but the nature of its inputs and outputs — define the LLM security problem. This pattern covers the model layer: the security risks that exist whenever an organisation uses an LLM, regardless of whether it is accessed via a chat interface, called through an API, used as the foundation for an agent, or fine-tuned on enterprise data. The threats addressed here — prompt injection, jailbreaking, hallucination, provider-side data exposure, privacy inference attacks on training data, and model extraction — are properties of the LLM interaction itself. Scope: this pattern addresses the LLM model layer. For security architecture of AI agents that take autonomous actions, invoke tools, and orchestrate sub-agents, see SP-047 (Secure Agentic AI Frameworks). For the enterprise governance management system — training data governance, bias, transparency, and impact assessment — see SP-045 (AI Governance and Responsible AI). Reading order for enterprise AI security: SP-027 (LLM layer) → SP-047 (agent/framework layer) → SP-045 (governance layer).
Release: 26.02 Authors: Aurelius, Vitruvius Updated: 2026-02-22
Assess
ATT&CK This pattern addresses 452 techniques across 13 tactics View on ATT&CK Matrix →
SP-027 SECURE LLM USAGE Security architecture for integrating large language models into enterprise environments AC-03 AC-04 AU-02 AU-03 CA-07 | RA-08 SA-09 SC-04 SC-07 SI-10 SR-02 | PT-02 PT-03 SA-04 SA-11 SI-03 INPUT & CLASSIFICATION LLM SECURITY CONTROLS LLM PROVIDER BOUNDARY Data Classification Classify all data before it enters the prompt No data above the LLM trust boundary Public Internal Confidential Restricted AC-04 PT-02 SC-04 Context Window Management RAG context · conversation history · tool outputs Each source classified and scope-limited Inject only what is necessary at the lowest tier SC-04 PT-03 AC-04 Prompt Construction System prompt separated from user input System Prompt Policy · Instructions · Scope User Input Treated as untrusted "“System prompt = policy, not secret” User input sanitised before assembly Structural separation enforced at runtime SI-10 SC-07 SI-03 Prompt Injection Defence Injection: attacker input overrides instructions The SQL injection of the AI era Input sanitisation Structural separation Injection detection SI-10 SI-03 SC-07 Output Validation Validate before acting — LLMs hallucinate Security-critical outputs require human or deterministic verification before action Never act on unvalidated LLM output in pipelines SI-10 CA-07 SA-11 Jailbreak Detection Adversarial prompts that override alignment Red-team test prompts quarterly Monitor for adversarial input patterns SI-10 AC-03 SA-11 Model Extraction Control Rate limiting · Anomaly detection · System prompt protection SR-02 SA-09 AU-02 Provider Security Assessment Treat LLM provider as critical third party Contractual data handling agreements required SOC 2 Type II certification ISO 27001 / 27017 Data Processing Agreement (DPA) SA-09 SA-04 Data Residency & Privacy Where is prompt data processed and stored? GDPR, financial reg, and sector rules apply to LLM input and conversation history Validate residency before production deployment PT-02 PT-03 SA-09 Fine-Tuning Risks Fine-tuning on PII risks training data leakage Privacy inference: reconstruct training samples Differential privacy techniques Data minimisation before fine-tune Sanitise / anonymise training corpus RA-08 PT-02 PT-03 Provider Tiers & Fallback Cloud API · Private hosted · On-premises Higher sensitivity → stricter residency requirement SA-09 SA-04 SC-04 AUDIT & INTERACTION LOGGING Prompt Logged User identity · model version · timestamp Response Logged Output hash · latency · token count Anomaly Detection Rate · patterns · jailbreak signals Retention & Review SIEM integration · periodic review AU-02 AU-03 AU-06 CA-07 CONTINUOUS MONITORING · INCIDENT RESPONSE · SECURITY TESTING AC-03 AC-04 AU-02 AU-03 CA-07 | SA-09 SC-04 SC-07 SI-10 SR-02 | RA-08 PT-02 IR-04 SP-027: Secure LLM Usage 26 NIST 800-53 Rev 5 controls across 9 families · Authors: Aurelius, Vitruvius · Draft · 2026-02-22 OWASP LLM Top 10 · NIST AI RMF · SP-045 AI Governance · SP-047 Agentic AI · SP-048 Offensive AI opensecurityarchitecture.org/patterns/sp-027 Data flow LLM response XX-00 NIST control (click to view)

Click any control badge to view its details. Download SVG

Key Control Areas

  • Data Classification for LLM Contexts (AC-04, PT-02, PT-03, SC-04): Every piece of data entering an LLM prompt or conversation context must be classified. Context windows create implicit data aggregation risk: individually innocuous data points may become sensitive in combination. Implement tiered context policies: public data flows freely, internal data requires justification, confidential data requires explicit approval, and restricted data (credentials, PII, regulated data) must never enter a context window without technical controls ensuring it cannot be persisted or exfiltrated. Information remnance (SC-04) is a specific risk: data shared with an LLM provider may persist in provider-side logs, training pipelines, or session caches beyond the user's awareness. PT-02 and PT-03 ensure lawful authority exists before personal data enters a context window, satisfying GDPR and equivalent privacy law obligations.
  • Prompt Security (SI-10, SI-03, SC-07): Prompt injection is the SQL injection of the AI era — an attacker who controls input content can subvert the LLM's intended behaviour. Direct injection attempts to override system instructions via user-supplied text. Indirect injection embeds malicious instructions in content the LLM fetches or processes: tool outputs, web pages, documents, PDFs, even image metadata. Defence requires strict separation between system instructions and user or retrieved content; output validation before the LLM's parsed data drives any action; content sanitisation for external content; and boundary protection (SC-07) between the LLM's context and untrusted content sources. System prompts should be treated as security policy: protect them from extraction and override, and never assume the LLM will enforce their constraints without application-level controls.
  • Jailbreaking and Safety Feature Bypass (SI-10, AC-03, SA-11): Jailbreaking refers to adversarial prompts that circumvent the LLM's built-in safety constraints — role-play scenarios ('pretend you are an AI without restrictions'), hypothetical framing, encoding tricks, and many-shot prompting that gradually shifts the model's response pattern. In enterprise contexts, safety features protect both ethical use and business policy enforcement: a safety bypass that produces harmful content also indicates a policy enforcement failure. Controls: red-team LLM safety boundaries as part of pre-deployment security testing (SA-11) to characterise the model's actual safety envelope; deploy application-layer guardrails that complement but do not solely rely on model-level safety mechanisms; monitor for anomalous prompt patterns indicative of jailbreak attempts; treat the model's built-in safety constraints as a baseline that must be reinforced by application-layer access enforcement (AC-03), not as a complete solution.
  • Output Validation and Hallucination Risk (SI-10, CA-07, SA-11): LLMs hallucinate — they produce plausible-sounding but factually incorrect outputs with apparent confidence. The security implications are direct: an AI-generated firewall rule that misinterprets the requirement, a compliance check citing a non-existent regulation, a vulnerability report fabricating CVE details, or infrastructure-as-code with a subtle misconfiguration. Controls: define output verification requirements scaled to the decision stakes (lower for informational output, mandatory human review for configuration changes, policy decisions, or security findings); never use LLM output as the sole authority for security decisions; implement continuous monitoring (CA-07) of LLM output quality with anomaly detection for output pattern shifts that may indicate model degradation or adversarial manipulation; include hallucination boundary testing in developer security testing (SA-11) for any LLM-integrated application.
  • Provider Security and Data Residency (SA-09, SA-04, PT-02): LLM providers are critical third-party dependencies. Assess provider security posture: SOC 2 Type II reports, data processing agreements (DPAs), breach notification commitments, and subprocessor chains. Data residency requirements must be met through regional API endpoint selection where providers offer it. Request explicit opt-out from training on your prompts and completions; seek zero data retention agreements for sensitive workloads. For fine-tuning and RAG scenarios, confirm that uploaded data is not used to improve the base model without consent. Evaluate provider concentration risk: if enterprise security tooling, code review, and infrastructure management all depend on a single model provider, a provider outage or compromise has cascading impact. SA-04 acquisition controls apply at the point of provider onboarding and annual review.
  • Privacy Inference Attacks on Fine-Tuned Models (PT-02, PT-03, RA-08): When an LLM is fine-tuned on enterprise data, two privacy attacks become relevant. Membership inference: an adversary queries the model in patterns designed to probabilistically determine whether a specific individual's data was in the training set — creating GDPR liability if data was processed without lawful basis or consent. Model inversion: systematic querying to reconstruct examples from the training data, potentially exposing PII, source code, or confidential content. Enterprise relevance: these attacks apply whenever fine-tuning uses customer records, employee data, proprietary documents, or any personally identifiable information. Mitigations: differential privacy techniques in fine-tuning pipelines, strict data minimisation before fine-tuning (only use data necessary and with clear lawful basis), careful review of fine-tuning datasets against these attack vectors. RA-08 (Privacy Impact Assessments) must be conducted before fine-tuning on any personal data, with membership inference risk explicitly identified as a privacy harm vector.
  • Model Extraction and System Prompt Protection (SR-02, SA-09, AU-02): Model extraction is systematic API querying designed to approximate or replicate a fine-tuned proprietary model — stealing the intellectual property and competitive advantage embedded in the fine-tuning investment. For organisations with proprietary fine-tuned models, rate limiting, query pattern monitoring (AU-02), and API key scoping are the primary defences. System prompt extraction — inferring or eliciting the system prompt through adversarial queries — leaks business logic, constraint definitions, and the organisation's AI configuration. Treat system prompts as security policy documents: classify them at the same level as the business logic they encode, never include them in logs or error messages, and implement application-layer controls to prevent their disclosure. SR-02 (Supply Chain Risk Management) covers the model itself as a supply chain dependency — pin model versions and test security-relevant behaviours after provider updates.
  • Audit of LLM Interactions (AU-02, AU-03, AU-06, CA-07): Log every LLM interaction with sufficient detail for forensic reconstruction: the initiating identity, the model and version used, the system prompt identifier (not content — the reference, to avoid logging sensitive prompts in plaintext), a hash of the user prompt, the response or a hash of it, and the timestamp. For tool-augmented or agentic use, also log what tool invocations were triggered by LLM outputs. Enable anomaly detection (CA-07) for interaction patterns: unusual prompt volumes, queries to topics outside normal usage, or output patterns inconsistent with the model's typical behaviour. Audit records (AU-03) must support attribution of LLM-assisted decisions to the human who initiated the session, maintaining accountability even when the model does the drafting. Retain audit logs independently of the LLM provider.

When to Use

Organisation uses LLMs via API or embedded enterprise products (Microsoft 365 Copilot, GitHub Copilot, Gemini for Workspace) for business-sensitive tasks including data processing, analysis, or decision support. LLMs are integrated into enterprise applications as a component. Organisation has or is planning to fine-tune LLMs on enterprise data including customer records, employee data, or proprietary content. LLMs are used in security tooling: AI-assisted SIEM analysis, AI-augmented code review, AI-based threat intelligence. Employees use consumer LLM tools for work purposes, introducing enterprise data into provider-side systems.

When NOT to Use

Organisation has no AI integration plans and no employee exposure to LLM tools in any operational capacity. AI usage is strictly limited to isolated, air-gapped, non-sensitive tasks with no enterprise data input. Note: this contra-indication is increasingly rare — even organisations that do not formally deploy LLMs may have employees using AI tools informally, and many enterprise SaaS products now embed LLMs by default.

Typical Challenges

Prompt injection defence maturity remains at the equivalent of early SQL injection — the vulnerability class is well-understood but reliable systematic defences do not exist. No framework equivalent to parameterised queries has emerged for prompt injection, and mitigations (input/output filtering, canary tokens, instruction hierarchy) reduce risk but can be bypassed by determined attackers. Provider-side data handling is contractually constrained but practically opaque: what providers log, retain, and potentially use for training is difficult to audit independently. Privacy inference attacks on fine-tuned models (membership inference, model inversion) require ML engineering competence to assess and mitigate that most security teams have not yet developed. Output validation is inherently difficult — there is no deterministic way to verify LLM output correctness, and human review of high-volume LLM output at scale is impractical. Jailbreak techniques evolve continuously as researchers and attackers discover new ways to probe model safety constraints — defences effective today may fail after model updates. Provider lock-in grows as fine-tuned models and proprietary system prompts become tied to specific provider APIs.

Threat Resistance

Prompt injection — direct (system prompt override via user input) and indirect (malicious instructions embedded in fetched content, documents, or tool outputs) — addressed through strict instruction/data separation, output validation before LLM-parsed data drives any action, and boundary protection preventing untrusted content from reaching the system prompt context. Jailbreaking and safety feature bypass defended through adversarial pre-deployment testing that characterises the model's actual safety envelope, application-layer guardrails that complement model-level safety, and anomaly monitoring for adversarial prompt patterns. Hallucination-driven misconfiguration mitigated by output verification requirements scaled to decision stakes and mandatory human review for security-critical outputs — never using LLM output as the sole authority for security decisions. Provider-side data exposure addressed through DPAs, training opt-out agreements, zero data retention where required, and data classification controls governing what enters context windows. Privacy inference attacks on fine-tuned models (membership inference, model inversion) mitigated through privacy impact assessments before fine-tuning, differential privacy in training pipelines, and strict data minimisation in fine-tuning datasets. Model extraction and system prompt leakage addressed through rate limiting, query pattern monitoring, and treating system prompts as classified security policy documents. AI-amplified social engineering addressed through awareness training calibrated to the scale and sophistication of AI-generated phishing and pretexting.

Assumptions

Organisations access LLMs via cloud API services (Anthropic, OpenAI, Google, Azure OpenAI) over TLS-protected network connections. The pattern covers any use of LLMs: direct API integration, embedded enterprise products (Microsoft 365 Copilot, GitHub Copilot, Gemini for Workspace), and fine-tuned models on enterprise data. Some organisations additionally deploy open-weight models (Llama, Mistral) self-hosted on-premises, where provider-side risks shift but model-layer risks remain. The security risks addressed here apply at the model interaction layer regardless of whether the LLM underlies a chat interface, an application, or an agent. For security of AI agents that take autonomous actions and invoke tools, see SP-047 (Secure Agentic AI Frameworks). Model capabilities and the associated attack surface are advancing rapidly — this pattern should be reviewed quarterly.

Developing Areas

  • Jailbreak research and defence maturity: jailbreaking techniques are catalogued by researchers (role-play, many-shot, encoding, context manipulation) but defences remain reactive rather than systematic. The community is debating whether model-level training mitigations or application-layer policy engines are the more durable solution. No consensus has formed on standardised adversarial testing scope or pass/fail criteria for LLM safety assessments.
  • Privacy inference attack tooling: membership inference and model inversion attack toolkits (LiRA, MI-BENCH) are maturing as research tools but are not yet routinely used in enterprise security assessments. The gap between academic attack capability and enterprise defensive awareness is significant — organisations fine-tuning on personal data may be exposed to GDPR liability risks they have not assessed.
  • EU AI Act Article 50 (synthetic content disclosure): the obligation to disclose AI involvement in content generation creates downstream security implications for phishing detection, forensic attribution, and content authenticity verification. The technical implementation of disclosure (watermarking, provenance metadata) is unresolved and the enforcement ecosystem is not yet mature.
  • NIST AI 100-2 and adversarial ML standardisation: NIST's Adversarial Machine Learning taxonomy provides the most structured framework for LLM attack classification, but enterprise implementation guidance translating the taxonomy into specific testing requirements and control baselines is still developing.
AC: 2AT: 2AU: 3CA: 2IA: 1IR: 1PS: 1PT: 2RA: 1SA: 4SC: 3SI: 3SR: 1
AC-03 Access Enforcement
AC-04 Information Flow Enforcement
AT-02 Literacy Training and Awareness
AT-03 Role-Based Training
AU-02 Event Logging
AU-03 Content of Audit Records
AU-06 Audit Monitoring, Analysis, and Reporting
CA-02 Control Assessments
CA-07 Continuous Monitoring
IA-05 Authenticator Management
IR-04 Incident Handling
PS-07 Third-Party Personnel Security
PT-02 Authority to Process Personally Identifiable Information
PT-03 Personally Identifiable Information Processing Purposes
RA-08 Privacy Impact Assessments
SA-04 Acquisition Process
SA-08 Security and Privacy Engineering Principles
SA-09 External System Services
SA-11 Developer Security Testing
SC-04 Information Remnance
SC-07 Boundary Protection
SC-12 Cryptographic Key Establishment and Management
SI-03 Malicious Code Protection
SI-04 System Monitoring
SI-10 Information Input Validation
SR-02 Supply Chain Risk Management Plan