Security Metrics and Measurement
Click any control badge to view its details. Download SVG
Key Control Areas
- Programme Metrics and Governance (PM-06, PM-09, PM-14): Establish a formal security metrics programme with defined ownership (typically the CISO office or GRC function), documented methodology, and governance cadence. Define three metric tiers: operational (daily/weekly, consumed by security teams), management (monthly/quarterly, consumed by security leadership), and executive (quarterly/annual, consumed by board and C-suite). For each metric: define the measure precisely (what is counted, how, from what source), establish a baseline from current data before setting targets, set targets that are ambitious but achievable (SMART criteria), define collection frequency and automation requirements, assign an owner responsible for data quality. Implement a metrics catalogue documenting every metric with its definition, data source, collection method, owner, consumers, and review date. Review the catalogue quarterly to retire stale metrics and add new ones. PM-06 (Measures of Performance) is the primary NIST control — it requires organisations to develop, monitor, and report on security performance measures.
- Vulnerability Management Metrics (RA-05, SI-02, CA-07): Measure the organisation's ability to find and fix vulnerabilities before exploitation. Key metrics: Mean Time to Remediate (MTTR) by severity — track separately for critical (target: <7 days), high (<30 days), medium (<90 days), low (<180 days); Patch Coverage Rate — percentage of systems patched within SLA (target: >95% for critical); Vulnerability Density — vulnerabilities per asset, trended monthly; Scanning Coverage — percentage of assets scanned within policy period (target: 100%); Known Exploited Vulnerability (KEV) closure rate — CISA KEV items remediated within CISA timeline; Age Distribution — histogram of open vulnerability age showing whether the backlog is growing or shrinking; Risk-Accepted Exceptions — count and age of formally risk-accepted vulnerabilities with owner and review date. Avoid: total vulnerability count (meaningless without context), raw scan output volume, counting informational findings. Data sources: vulnerability scanner (Qualys, Tenable, Rapid7), CMDB for asset inventory, ticketing system for remediation tracking.
- Detection and Response Metrics (IR-04, IR-05, AU-06, SI-04): Measure the SOC's ability to detect, investigate, and contain threats. Key metrics: Mean Time to Detect (MTTD) — elapsed time from initial compromise to detection (measure from forensic timeline, not alert timestamp); Mean Time to Respond (MTTR) — elapsed time from detection to containment; Mean Time to Contain (MTTC) — elapsed time from detection to confirmed containment of the threat; Alert-to-Incident Ratio — percentage of alerts that become confirmed incidents (indicates detection quality); False Positive Rate — percentage of alerts closed as false positive (target: <70%, if higher the detection rules need tuning); Containment Rate — percentage of incidents contained before lateral movement; Coverage by MITRE ATT&CK — percentage of ATT&CK techniques with at least one detection rule (target: >60% for initial access, execution, and persistence tactics). Avoid: raw alert count (more alerts is not better), incidents closed (incentivises premature closure), uptime of SIEM (operational, not security). Data sources: SIEM/SOAR platform, EDR telemetry, incident management system, ATT&CK Navigator for coverage mapping.
- Offensive Testing Metrics (CA-08, CA-02, RA-03): Measure the organisation's resilience through red team, penetration testing, and purple team exercises. Key metrics: Findings by Severity Trend — are critical/high findings decreasing over successive test cycles? Remediation Velocity — percentage of pen test findings remediated before the next test cycle; Simulated Dwell Time — how long did the red team maintain access before detection? (target: decreasing trend); Initial Access Success Rate — did the red team achieve initial access, and through what vector? Attack Path Length — number of steps from initial access to objective (crown jewels); Purple Team Coverage — percentage of MITRE ATT&CK techniques tested with confirmed detection; Mean Time to Detect Red Team — measured from red team exercise timeline, compared against real incident MTTD; Repeat Findings Rate — percentage of findings that appeared in the previous test (target: <10%, indicates remediation effectiveness). Avoid: counting total findings without severity weighting, comparing findings across different scope/methodology tests, red team metrics without corresponding blue team detection metrics. Data sources: penetration test reports, red team after-action reports, purple team exercise logs, MITRE ATT&CK Navigator.
- Awareness and Human Risk Metrics (AT-02, AT-03, PM-13): Measure the organisation's human layer security through training effectiveness and behavioural indicators. Key metrics: Phishing Simulation Click Rate — percentage of employees who click simulated phishing links (target: <5%, industry average ~15-20%); Phishing Report Rate — percentage of employees who report simulated phishing to the SOC (more important than click rate — measures the positive behaviour); Time to Report — elapsed time from phishing email delivery to first employee report; Training Completion Rate — percentage of employees completing mandatory security awareness training within the required period (target: >95%); Repeat Clickers — percentage of employees who click phishing simulations more than once in 12 months (identifies individuals needing targeted intervention); Social Engineering Resistance — red team physical/vishing/pretexting success rate; Security Champion Coverage — percentage of development teams with a designated security champion. Avoid: training hours completed (activity, not effectiveness), quiz pass rates on trivial questions, click rate as the sole metric (punitive, does not measure reporting behaviour). Data sources: phishing simulation platform (KnowBe4, Proofpoint), LMS for training completion, security champion programme records.
- Compliance and Control Effectiveness Metrics (CA-02, CA-05, PL-02): Measure the organisation's compliance posture and control health. Key metrics: Control Implementation Rate — percentage of required controls fully implemented versus partially or not implemented (by framework: NIST 800-53, ISO 27001, PCI DSS); Audit Finding Trend — count of audit findings by severity trended over successive audit cycles (internal and external); Plan of Action and Milestones (POA&M) — count and age of open remediation items, percentage closed within committed timeline; Framework Alignment Score — percentage of applicable controls meeting target maturity level per framework; Exception Count — number of active risk acceptances and policy exceptions with owner and review date; Control Testing Coverage — percentage of controls tested in the current assessment period; Time to Close Audit Findings — elapsed time from finding issuance to verified remediation. Avoid: compliance percentage as a single number (hides severity distribution), self-assessed scores without validation, counting policies published rather than controls implemented. Data sources: GRC platform, internal audit reports, external audit reports, OSA assessment scores.
- Executive Reporting and Dashboard Design (PM-06, AU-02, PM-09): Translate operational and management metrics into executive-level risk communication. Design principles: no more than 5-7 metrics per executive dashboard (cognitive limit); use trend lines not point-in-time snapshots; show performance against target, not just current value; use traffic light (RAG) status sparingly — only where thresholds are well-defined and validated; include peer comparison where available. Standard executive metrics: Overall Risk Posture Score (composite, trended quarterly), Critical Vulnerability Exposure (count and age of unpatched critical/KEV items), Incident Impact (business hours lost, data records affected), Compliance Status by Framework (percentage meeting target maturity), Third Party Risk Summary (vendor tier distribution and assessment currency), Investment Effectiveness (security spend per employee trended, cost per incident resolved). Report cadence: board quarterly with annual deep-dive, C-suite monthly, security leadership weekly. Every metric shown to the board must have a defined 'so what' — what action would we take if this metric deteriorated?
- Benchmarking and Industry Comparison (PM-06, RA-03, PM-14): Context makes metrics meaningful — knowing your MTTR is 25 days is useful, knowing the industry median is 60 days makes it powerful. Implement three benchmarking layers: internal trending (compare against your own historical performance — most reliable), peer benchmarking (compare against industry vertical — requires participation in information sharing), and maturity benchmarking (assess capability maturity against a framework like NIST CSF). Sources of benchmark data: Verizon DBIR (incident frequency and attack patterns), Ponemon/IBM Cost of a Data Breach (financial impact benchmarks), SANS Security Awareness Report (phishing click rates by industry), BitSight/SecurityScorecard (external security posture ratings), OSA Assessment Tool (pattern maturity benchmarking across industry verticals). Publish internal benchmarks within the organisation to create accountability — teams that see peer comparison data improve faster than teams that see only their own metrics. Avoid: benchmarking against unvalidated self-reported survey data, comparing metrics across organisations with different definitions, using benchmarks as absolute targets rather than directional indicators.
When to Use
Any organisation with a security programme that reports to executive management or the board. Organisations seeking to justify security investment through demonstrated outcomes. Security teams that want to move from qualitative risk statements to quantitative evidence. Organisations with multiple security tools generating data that is not aggregated into a coherent view. Any entity implementing OSA patterns that wants to measure their effectiveness over time.
When NOT to Use
Very small organisations where security is a part-time function and measurement overhead exceeds the value of the insight. Organisations in the very early stages of building a security programme where the priority is implementing basic controls, not measuring them.
Typical Challenges
Goodhart's Law — metrics become targets, causing gaming behaviour (closing tickets without investigation, patching easy items first, sending obviously fake phishing emails). Vanity metrics — measuring activity (scans run, trainings delivered) rather than outcomes (risk reduced, capability improved). Green dashboard syndrome — aggregating metrics to a level where everything looks green while individual risk areas are red. Measurement without action — collecting and reporting metrics that nobody uses for decision-making. Data quality — metrics derived from incomplete asset inventories, inconsistent severity classifications, or manual data entry. Cost of collection — the measurement programme consumes resources that could be spent on actual security improvement. Translation loss — operational metrics lose fidelity as they are aggregated and simplified for executive audiences. Baseline absence — setting targets without historical data leads to arbitrary thresholds. Lagging indicator dependency — most security metrics measure past performance, not predictive risk.
Threat Resistance
Addresses measurement gaming through outcome-focused metric design and multi-layered measurement (activity alone is never sufficient). Reduces executive misinterpretation through structured translation from operational to executive metrics with defined 'so what' actions. Prevents green dashboard syndrome through severity-weighted metrics and drill-down requirements. Ensures measurement drives action through governance cadence requiring documented response to metric deterioration. Provides industry context through benchmarking to prevent both complacency (we are fine) and false alarm (everything is broken).
Assumptions
The organisation has a security programme with multiple operational functions (vulnerability management, SOC, GRC, awareness). Data sources exist for metric collection (vulnerability scanner, SIEM, ticketing system, GRC platform). Security leadership has a reporting obligation to executive management or the board. The organisation wants to move beyond compliance-driven security toward risk-driven security with measurable outcomes.
Developing Areas
- OCSF (Open Cybersecurity Schema Framework) adoption for security data normalisation is gaining momentum as organisations struggle to aggregate metrics across heterogeneous tool stacks. Without a common schema, correlating vulnerability data from Qualys with incident data from Splunk and access data from Okta requires custom ETL pipelines per tool pair. OCSF promises vendor-neutral event normalisation, but adoption is in early stages -- fewer than 20 security vendors have published OCSF mappings, and the schema itself is still evolving with quarterly releases adding new event classes.
- The shift from activity-based metrics to outcome-based metrics is well-understood conceptually but poorly implemented in practice. Most security teams can measure what they did (patches applied, alerts triaged, trainings delivered) but struggle to measure what changed as a result (risk reduced, exposure decreased, capability improved). Outcome measurement requires causal attribution that is methodologically difficult -- did MTTR improve because of SOAR automation or because the threat landscape was quieter this quarter? Emerging approaches use synthetic control methods and A/B testing adapted from product analytics, but these techniques are foreign to most security teams.
- Security data lake analytics maturity is improving as cloud-native SIEM platforms (Chronicle, Sentinel, Panther) offer long-term retention and ad-hoc query capabilities that legacy SIEMs could not. The ability to run arbitrary analytical queries across years of security telemetry enables retrospective metric calculation, trend analysis, and threat hunting that periodic reporting cannot match. However, the cost of storing and querying petabytes of security data, combined with the analytics skills required to extract meaningful insights, limits this capability to large enterprises.
- Risk quantification using FAIR (Factor Analysis of Information Risk) is gaining board-level interest as executives demand financial language for cyber risk. FAIR enables statements like 'our annualised loss expectancy from ransomware is $4.2M' rather than 'ransomware risk is rated High'. However, FAIR implementations require input data (threat event frequency, vulnerability prevalence, loss magnitude distributions) that most organisations cannot reliably estimate. The result is often false precision -- a quantified number that implies more confidence than the underlying data supports.
- Cross-organisation benchmarking remains hampered by inconsistent metric definitions and reluctance to share data. When Organisation A reports MTTR of 15 days and Organisation B reports 25 days, the difference may reflect measurement methodology (when does the clock start?), severity classification (what counts as critical?), or scope (all assets or production only?) rather than actual performance difference. Industry initiatives (FS-ISAC benchmarking, CIS metrics) are working toward standardised definitions, but participation rates remain below the threshold needed for statistically meaningful peer comparison in most sectors.