← Patterns / SP-035

Offensive Security Testing

Offensive Security Testing validates that security controls actually work against realistic adversaries, not just against compliance checklists. Traditional penetration testing -- a scoped assessment of known attack surfaces over a few weeks -- has value but fundamental limitations. It tests whether specific vulnerabilities exist, not whether the organisation can detect and respond to a determined attacker. Red teaming, purple teaming, and intelligence-led testing frameworks like CBEST and TIBER-EU fill this gap. The pattern defines five testing approaches on a spectrum of maturity. Vulnerability assessment identifies known weaknesses through automated scanning and manual verification -- it answers 'what could be exploited?' Penetration testing attempts to exploit those weaknesses to demonstrate impact -- it answers 'can it be exploited and what happens?' Red teaming simulates a realistic adversary campaign with full tactics, techniques, and procedures against the live environment with minimal prior knowledge -- it answers 'can we detect and stop a determined attacker?' Blue teaming exercises focus on the defenders: detection, triage, containment, and recovery against simulated attacks -- it answers 'how effective is our SOC?' Purple teaming combines red and blue in collaborative exercises where attackers and defenders work together in real time to test and improve specific detection capabilities -- it answers 'what should we detect and can we?' Intelligence-led testing (CBEST in the UK, TIBER-EU across Europe, AASE in Hong Kong) adds a structured threat intelligence phase. A threat intelligence provider analyses the organisation's threat landscape and produces a targeted threat assessment. An independent red team then simulates the specific threat actors and scenarios identified in the intelligence report, attacking live production systems. The blue team is not informed. This tests not just technical controls but the entire detection, escalation, and response chain under realistic conditions. The critical architectural principle is that testing must be continuous, not episodic. Annual penetration tests provide a point-in-time snapshot that decays within weeks as the environment changes. Continuous validation through automated breach and attack simulation (BAS), regular purple team exercises, and periodic red team engagements maintains assurance that controls remain effective as the threat landscape and the technology estate evolve.
Release: 26.02 Authors: Aurelius, Vitruvius Updated: 2026-02-07
Assess
ATT&CK This pattern addresses 453 techniques across 13 tactics View on ATT&CK Matrix →
OFFENSIVE SECURITY GOVERNANCE & OVERSIGHT CA-08 CA-02 | RA-05 PM-14 | AT-03 RA-03 | CA-07 SA-11 Testing Maturity Spectrum Basic Advanced VULN ASSESSMENT Automated scanning CVE detection Configuration audit Compliance checks RA-05 CA-02 Broad coverage, shallow depth. Tool-driven. Frequency: Continuous PENETRATION TEST Manual exploitation Scope-defined testing OWASP methodology Written report & findings CA-08 SA-11 Targeted scope, expert-driven. Point-in-time. Frequency: Annual+ RED TEAM Adversary simulation TTP-based operations MITRE ATT&CK aligned Stealth & persistence CA-08 AT-03 Goal-oriented. No scope limits. Tests full kill chain. Frequency: Periodic BLUE TEAM Defensive operations Detection & response SIEM / SOAR / EDR Threat hunting IR-04 SI-04 Always-on defence. Detects real & simulated attacks. Frequency: Continuous PURPLE TEAM Collaborative testing Joint Red + Blue exercises Real-time feedback loop Detection gap analysis PM-14 CA-07 Amplifies both teams. Shared objectives. Frequency: Iterative INTELLIGENCE-LED Threat-intel driven CBEST / TIBER-EU Bespoke threat scenarios Regulator-mandated Critical infrastructure CA-08 RA-03 PM-16 Most mature level. Real threat actors modelled. Board-level assurance. Frequency: Regulatory cycle Attack findings Detection evidence Improve TTPs Improve detection CONTINUOUS IMPROVEMENT & REPORTING PM-14 CA-02 CA-07 | AU-06 IR-04 | RA-03 PM-16 SP-035: Offensive Security Testing Red / Blue / Purple Teams · Testing Maturity Spectrum · Authors: Aurelius, Vitruvius · Draft · 2026-02-07 Red Team (Attack) Blue Team (Defend) Purple Team (Collaborate) XX-00 NIST control (click to view) Feedback loop Primary reference: CBEST Intelligence-Led Testing — bankofengland.co.uk · TIBER-EU Framework — ecb.europa.eu · MITRE ATT&CK opensecurityarchitecture.org

Click any control badge to view its details. Download SVG

Key Control Areas

  • Penetration Testing and Vulnerability Assessment (CA-08, RA-05, RA-03, CA-02, SA-11): The foundation of offensive testing. CA-08 mandates penetration testing including: scope definition (which systems, which attack vectors, what rules of engagement), execution (attempting to exploit vulnerabilities in a controlled manner), reporting (findings with evidence, risk ratings, and remediation guidance), and remediation verification (retesting after fixes are applied). RA-05 provides vulnerability monitoring and scanning: continuous automated scanning of infrastructure, applications, and configurations to identify known weaknesses. RA-03 conducts risk assessments that inform testing priorities: test the highest-risk systems most frequently and most rigorously. CA-02 performs control assessments that validate whether security controls are correctly implemented and operating as intended. SA-11 covers developer testing and evaluation: security testing integrated into the development lifecycle including static analysis, dynamic analysis, and dependency scanning. Penetration testing must cover external attack surface (internet-facing systems), internal attack surface (assuming a compromised insider or phished user), application layer (business logic, authentication, authorisation), and social engineering (phishing, vishing, physical access).
  • Red Team Operations and Adversary Simulation (CA-08, PM-16, RA-03, SI-04, AT-03): Red teaming simulates realistic adversary campaigns. CA-08 at the red team level encompasses: objective-based testing (achieve specific goals such as accessing crown jewel data rather than finding all vulnerabilities), covert operations (the blue team is not informed, testing real detection capability), realistic TTPs (using techniques observed in real adversary campaigns against the organisation's sector), and full kill chain simulation (initial access, persistence, lateral movement, privilege escalation, data exfiltration). PM-16 informs red team scenarios with threat intelligence: current threat actor profiles, sector-specific attack campaigns, and TTPs from recent incidents. RA-03 identifies the scenarios to simulate: the risk assessment determines which threat actors and scenarios pose the greatest risk, and those become the red team objectives. SI-04 is tested implicitly: every red team action that is not detected reveals a monitoring gap. AT-03 provides role-based training for red team operators, blue team analysts, and exercise coordinators. Red team engagements must have clear rules of engagement including: scope boundaries (out-of-scope systems, prohibited actions), safety controls (deconfliction channels, emergency stop procedures), evidence requirements (logging all actions for post-exercise analysis), and legal authorisation (signed engagement letter from appropriate authority).
  • Purple Team Collaboration and Detection Validation (CA-08, CA-07, AU-06, SI-04, PM-14): Purple teaming is the most effective approach for rapidly improving detection capability. CA-08 in the purple team context means collaborative testing where the red team executes specific TTPs while the blue team attempts to detect and respond in real time, with immediate feedback on what was detected and what was missed. CA-07 provides continuous monitoring validation: each purple team exercise tests whether monitoring tools, correlation rules, and alert logic detect the executed techniques. AU-06 validates audit record analysis: can analysts identify the attack pattern in the log data? Are the right logs being collected? Is the data sufficient for investigation? SI-04 tests system monitoring effectiveness for specific attack techniques: does EDR detect the payload? Does the SIEM correlate the events? Does the alert reach an analyst within the target time? PM-14 ensures purple team findings feed back into detection engineering: every missed detection becomes a new detection rule, every slow response becomes a process improvement. Purple teams should work through the MITRE ATT&CK matrix systematically, testing detection coverage for each relevant technique and sub-technique, building a heat map of detection capability that drives investment priorities.
  • Intelligence-Led Testing: CBEST, TIBER-EU, and AASE (PM-16, RA-03, CA-08, IR-04, CA-02): Intelligence-led frameworks represent the highest maturity level. PM-16 provides the threat intelligence foundation: an accredited threat intelligence provider analyses the organisation's threat landscape including likely adversaries, their capabilities, their objectives, and their most probable attack paths. RA-03 translates intelligence into test scenarios: specific attack narratives that the red team will execute against live production systems. CA-08 governs the red team execution: testing against production with no prior notification to the blue team, simulating the intelligence-led scenarios over a multi-week campaign. IR-04 is tested holistically: the organisation's incident detection, escalation, and response processes are exercised under realistic conditions. CA-02 provides the assessment phase: after the exercise, the control team (who coordinated between red and blue) leads a detailed debrief comparing the red team's actions with the blue team's detections, identifying gaps and improvement actions. CBEST (Bank of England) tests UK systemic financial infrastructure. TIBER-EU (European Central Bank) provides a harmonised framework across EU member states. Both require accredited providers and follow structured phases: scoping, intelligence, testing, and closure with remediation plans.
  • Remediation Management and Continuous Improvement (CA-05, PM-04, RA-07, CM-04, SA-15): Findings without remediation waste the testing investment. CA-05 manages the plan of action and milestones: every finding from offensive testing becomes a tracked remediation item with owner, priority, target date, and verification criteria. PM-04 provides the plan of action and milestones process at the organisational level, aggregating findings across all testing activities. RA-07 defines risk responses: each finding is assessed and addressed through remediation (fix the vulnerability), mitigation (add compensating controls), acceptance (acknowledge the risk with executive sign-off), or transfer (insurance or outsourcing). CM-04 analyses security impact of changes: remediation actions are tested to verify they fix the vulnerability without introducing new issues. SA-15 ensures development processes incorporate lessons from offensive testing: patterns of vulnerability become secure coding standards, recurring issues become automated security gates. The remediation cycle must close the loop: findings are tracked until verified fixed by retesting, and systemic issues drive architectural changes rather than point fixes.
  • Automated Breach and Attack Simulation (CA-07, CA-08, SI-04, RA-05, PM-14): Continuous automated validation between human-led exercises. CA-07 provides continuous monitoring that includes automated attack simulation: BAS platforms continuously execute known attack techniques against production controls to verify they remain effective. CA-08 at the automated level runs scripted penetration scenarios on a scheduled basis: automated external scans, internal network sweeps, and application security tests that maintain a continuous security baseline. SI-04 is validated continuously: BAS platforms test whether EDR, SIEM, and email security detect simulated attacks, alerting when previously effective detections stop working (due to configuration drift, signature updates, or infrastructure changes). RA-05 integrates with BAS to test whether newly discovered vulnerabilities are detectable by existing controls. PM-14 uses BAS results to maintain confidence between manual assessments: if automated tests show consistent detection, the organisation can have higher confidence in its security posture. BAS does not replace human-led red teaming -- it maintains the baseline between engagements and provides early warning when controls degrade.
  • Legal, Ethical, and Governance Framework (PL-04, PS-06, PS-07, AC-01, PM-09): Offensive testing requires robust governance. PL-04 defines rules of behaviour for testing activities: what is permitted, what is prohibited, and how to handle unexpected findings (e.g., discovery of illegal content, evidence of ongoing compromise by a real adversary). PS-06 provides access agreements for testers: signed authorisation from appropriate authority (typically CISO or board), scope boundaries, data handling requirements, and confidentiality obligations. PS-07 governs external personnel security for third-party testing firms: vetting requirements, insurance, contractual obligations, and incident notification. AC-01 establishes access control policy for testing: testers receive controlled access, their activities are logged independently, and access is revoked immediately upon completion. PM-09 ensures the risk management strategy includes offensive testing as a core assurance activity, with appropriate budget allocation, executive reporting, and board-level visibility of findings.

When to Use

This pattern is relevant for any organisation that wants evidence-based assurance of its security controls. It is essential for: financial services organisations subject to CBEST, TIBER-EU, or equivalent regulatory testing requirements, critical national infrastructure operators, organisations handling sensitive personal or financial data, entities that have suffered breaches and need to validate remediation effectiveness, organisations with mature security operations wanting to test and improve detection capability, and any organisation that relies on security compliance certifications and wants to validate that compliance translates to actual security.

When NOT to Use

Organisations without basic security controls in place (no patching, no access control, no monitoring) will get limited value from offensive testing -- the findings will be obvious and overwhelming. Fix the fundamentals first. Very small organisations with simple environments may find a standard vulnerability assessment and basic penetration test sufficient, without the overhead of red team or intelligence-led testing. Organisations without the capacity to remediate findings should not invest in expensive testing that produces reports which sit unactioned. If your SOC does not exist yet, testing it is pointless -- build it first (see SP-031), then test it.

Typical Challenges

Scope creep and scope gaps are mirror-image problems: testing too broadly increases risk and cost, while testing too narrowly creates a false sense of security. Rules of engagement that are too restrictive prevent realistic simulation, while rules that are too permissive create genuine risk to production systems. Red team findings can create political tension when they reveal that expensive security investments are not working as expected. The gap between finding vulnerabilities and fixing them is often measured in months, during which the organisation is knowingly exposed. Automated BAS tools require ongoing maintenance and tuning to avoid alert fatigue from simulated attacks. Intelligence-led testing is expensive (six-figure engagements) and resource-intensive, limiting frequency. Finding skilled red team operators is difficult; the talent market is extremely competitive. Testing cloud environments requires navigating cloud provider acceptable use policies and shared responsibility boundaries. Social engineering tests (phishing simulations) can damage employee trust if poorly communicated. Purple team exercises require mature relationships between red and blue teams; adversarial dynamics undermine collaboration.

Threat Resistance

Offensive Security Testing does not directly resist threats -- it validates that other controls do. Its value is meta-security: providing evidence-based assurance that the security architecture works as designed. By simulating real adversary techniques, it identifies gaps before real attackers exploit them. Specific validation includes: detection of initial access techniques (phishing, exploitation) validating controls from IA-02, SI-04, and AU-06. Detection of lateral movement validating SC-07, AC-04, and SI-04. Detection of privilege escalation validating AC-06, AU-02, and CM-06. Detection of data exfiltration validating AC-04, SI-04, and SC-07. Incident response effectiveness validating IR-04, IR-05, and IR-08. Recovery capability validating CP-09, CP-10, and CP-04. The ultimate measure of offensive testing effectiveness is reduced mean-time-to-detect and mean-time-to-respond in real incidents, and reduced blast radius when incidents occur.

Assumptions

Executive sponsorship exists for offensive testing including acceptance of controlled risk to live systems. The organisation has security operations capability (SOC) whose effectiveness can be tested. Budget is available for external testing providers (red team, threat intelligence, penetration testing). Legal authorisation for testing can be obtained from appropriate authority. The technology estate is sufficiently documented to define meaningful test scopes. Remediation capacity exists to address findings within reasonable timeframes. For intelligence-led testing: the organisation is of sufficient scale and criticality to warrant CBEST/TIBER-EU-level investment.

Developing Areas

  • AI-assisted red teaming is fundamentally changing the economics of offensive testing. Large language models can generate phishing content, identify attack paths from configuration data, and automate reconnaissance at a pace that manual red teams cannot match. This simultaneously lowers the cost of testing and raises the bar for defenders, as adversaries gain access to the same AI-augmented capabilities. The methodology for incorporating AI into red team operations is still forming, with no established framework governing AI tool use during engagements.
  • Continuous Automated Red Teaming (CART) aspires to replace periodic human-led engagements with always-on adversary simulation but remains immature. Current CART implementations are essentially enhanced BAS platforms with limited ability to chain techniques creatively or adapt to novel defences. The gap between scripted attack simulation and genuinely adaptive adversary behaviour means CART complements but cannot replace human red teams for at least the next 3-5 years.
  • Purple teaming is transitioning from an occasional exercise to a standard operating practice, with dedicated purple team roles emerging in mature SOCs. The challenge is formalising the methodology: how to structure systematic coverage of MITRE ATT&CK techniques, how to measure detection improvement over successive exercises, and how to scale purple team practices across large organisations without creating a permanent dependency on expensive red team operators.
  • Offensive testing for AI and ML systems is a nascent discipline with no established methodology. Testing AI models for prompt injection, training data extraction, model inversion, and adversarial manipulation requires skills that traditional penetration testers do not possess. OWASP has published initial guidance (OWASP ML Top 10, LLM Top 10), but the tooling and practitioner ecosystem for AI red teaming is at least 3 years behind traditional application security testing.
  • Cloud-native attack simulation faces unique challenges around shared responsibility boundaries and provider acceptable use policies. Simulating lateral movement across AWS accounts, testing Kubernetes cluster escape, or exploiting cloud IAM misconfigurations requires careful coordination with cloud providers to avoid violating terms of service. The tooling for cloud-specific attack paths (Stratus Red Team, Pacu, GCPBucketBrute) is maturing but lacks the integration and automation of traditional infrastructure testing frameworks.
AC: 2AT: 2AU: 2CA: 4CM: 2IR: 2PL: 1PM: 4PS: 2RA: 3SA: 3SC: 1SI: 1
AC-01 Policy and Procedures
AC-06 Least Privilege
AT-02 Literacy Training and Awareness
AT-03 Role-Based Training
AU-02 Event Logging
AU-06 Audit Record Review, Analysis, and Reporting
CA-02 Control Assessments
CA-05 Plan of Action and Milestones
CA-07 Continuous Monitoring
CA-08 Penetration Testing
CM-04 Impact Analyses
CM-06 Configuration Settings
IR-03 Incident Response Testing
IR-04 Incident Handling
PL-04 Rules of Behavior
PM-04 Plan of Action and Milestones Process
PM-09 Risk Management Strategy
PM-14 Testing, Training, and Monitoring
PM-16 Threat Awareness Program
PS-06 Access Agreements
PS-07 External Personnel Security
RA-03 Risk Assessment
RA-05 Vulnerability Monitoring and Scanning
RA-07 Risk Response
SA-09 External System Services
SA-11 Developer Testing and Evaluation
SA-15 Development Process, Standards, and Tools
SC-07 Boundary Protection
SI-04 System Monitoring