Detecting Fault Injection at Runtime

Beyond surviving a glitch, a device can notice it is being attacked. Here is how runtime fault detection works and what to do when it triggers.

Detecting Fault Injection at Runtime

Runtime fault-injection detection on an embedded device using sensors, counters, and integrity checks

Surviving a single glitch is one defense. Noticing that someone is glitching you is another, and a powerful one. Fault injection takes many tuning attempts, so a device that detects faults and responds can turn the attacker’s necessary search into a self-defeating process. Here is how runtime fault detection works and what to do when it fires.

Detection Versus Resistance

Glitch-resistant code makes a single fault insufficient. Fault detection goes further: it notices that a fault occurred and lets the device react, by locking down, wiping secrets, or counting the event toward an escalating response. The two are complementary, and high-assurance devices use both.

The reason detection is so valuable is the economics of fault injection. An attacker rarely lands the right glitch on the first try; they sweep timing and parameters across many attempts to find the window. A device that recognizes those attempts and responds can deny the attacker the patient, repeated access the attack depends on, which raises the cost far more than passive resistance alone.

The Detection Mechanisms

A device detects faults through a handful of complementary mechanisms, each catching what the others miss:

Mechanism	What it catches	Typical response
Hardware fault sensors	Out-of-range voltage, clock, or temperature; decapsulation and probing	Hardware fault, then reset or key wipe
Software control-flow checks	A skipped step or unreachable path (the flow counter comes out wrong)	fault_detected() handler
Redundant / integrity checks	A bit flipped in a flag, key, or config word	Refuse to act on the corrupted value
Persistent fault counter	The many tuning attempts an attack needs, across resets	Escalate: delay, lockout, then key erase

Hardware Fault Sensors

Many secure microcontrollers and secure elements include sensors aimed squarely at fault injection: voltage monitors that flag out-of-range supply, frequency monitors that catch clock anomalies, temperature sensors, and light or mesh sensors that detect decapsulation and probing. When a sensor trips, it raises a hardware fault.

// enable the security sensors a part provides, and route them to a fault handler
SEC->TAMPER_CTRL = VOLT_MON_EN | CLK_MON_EN | TEMP_MON_EN | MESH_EN;
SEC->TAMPER_RESP = RESP_RESET | RESP_WIPE_KEYS;   // action on trip
NVIC_EnableIRQ(TAMPER_IRQn);

These sensors exist precisely because glitching manipulates voltage, clock, temperature, or the physical package. Enabling them, and configuring what happens when they trip, is often a matter of setting the right registers, yet they ship disabled by default on many parts, which is a common gap an assessment finds.

Software Fault Detection

Not every fault trips a hardware sensor, so firmware adds its own detection through the redundancy it already uses to resist glitches. An impossible value in a multi-bit flag, a control-flow path that should be unreachable, or a mismatch between two independent checks are all signals that a fault corrupted execution.

// a control-flow integrity check: a counter that must equal the steps taken
volatile uint32_t flow = 0;
flow += step_a();   // each returns a known increment
flow += step_b();
flow += step_c();
if (flow != EXPECTED_TOTAL) fault_detected();   // a skipped step is caught

A running control-flow counter that must reach a known total catches a glitch that skipped a step, because the total comes out wrong. These software checks cost a few instructions and turn the corruption that fault injection causes into something the device can see and act on, even when no hardware sensor noticed.

The Fault Counter

Because fault injection requires many attempts, a persistent fault counter is one of the strongest responses. Each detected fault increments a counter in non-volatile memory, and crossing a threshold triggers escalating action, from delays to lockout to key erasure. The attacker’s own repeated attempts drive the device toward shutting them out.

// persistent, escalating response to repeated faults
uint16_t fc = nv_read(FAULT_COUNT) + 1;
nv_write(FAULT_COUNT, fc);
if (fc > 50)  wipe_keys();          // sustained attack -> destroy secrets
else if (fc > 10) lock_for(60);     // suspicious -> cool-down lockout

The counter must persist across resets, because attackers reset constantly while tuning, and a counter that zeroed on reset would be useless. Stored in non-volatile memory, it accumulates across the whole attack campaign, so the search an attacker needs becomes the very thing that locks the device or destroys its secrets.

Integrity Checks on Critical Data

Faults can corrupt data as well as control flow, so security-critical values deserve integrity protection: a checksum or a duplicated copy of a key, a configuration word, or a permission flag lets the device detect a flipped bit and refuse to act on the corrupted value. This pairs naturally with the redundant encodings used for glitch resistance, a flag stored twice, once normal and once inverted, is checked for consistency before use, so the data integrity check and the fault detector are the same mechanism viewed two ways.

Responding Proportionately

What a device does on detection should match the threat and the product. The options run from gentle to severe: log the event, insert a delay, lock out for a cooldown, require re-authentication, or in the strongest response, erase the keys that protect the device’s secrets. The right choice depends on what is at stake and what false positives would cost.

Proportionality matters because environmental noise, a brownout, a hot day, a marginal supply, can trip a sensitive detector, and a device that wiped its keys on a single voltage dip would be a support nightmare. The usual pattern is to tolerate isolated events and escalate only on a pattern, calibrating thresholds against the device’s real operating envelope so detection stays tight enough to catch an attack and loose enough to ignore a noisy power supply. Testing across the full environmental range is how production devices keep fault detection both effective and quiet during normal use.

The Wipe-on-Tamper Response

For devices holding high-value secrets, the strongest response is to destroy the keys when a sustained attack is detected, rendering the protected data permanently unreadable, which is standard in payment hardware and secure elements where the secret is worth more than the device. The design requirement is that the wipe be fast and complete, clearing the actual key material and any derived copies before an attacker can freeze the state and extract it; a wipe that leaves a recoverable remnant, or is slow enough to be interrupted, gives a false sense of protection, so the erase path itself deserves careful design and testing.

Testing the Detection

Detection that has never been tested is an assumption. The way to validate it is the attacker’s way: point a fault-injection rig at the device and confirm the detectors fire, the counter escalates, and the response, lockout or wipe, actually happens. Anything that does not trigger is a blind spot to close.

# sweep glitch parameters and confirm the device detects and responds
for v in voltage_steps:
    inject_glitch(v); status = read_device_state()
    assert status in ('locked','wiped','fault_logged'), f"undetected at {v}"

A sweep that the device catches at every effective glitch setting confirms the detection works. A setting that produces a successful bypass without tripping any detector is exactly the gap an attacker would find, and finding it yourself, on the bench, is the entire point of testing the countermeasure rather than trusting it.

Detection as Part of a Layered Defense

Runtime fault detection is one layer, not a complete defense. It sits alongside glitch-resistant code that survives a single fault, hardware sensors that catch physical tampering, and the broader protections of encrypted firmware, secure key storage, and verified boot. The combination is what frustrates a capable attacker: glitch-resistant code means one fault is not enough; detection means many faults get noticed; the fault counter means the attempts needed to tune the attack are the attempts that lock the device.

For a product where physical attack is in the threat model, that combination is worth designing in, because passive resistance alone lets an attacker keep trying indefinitely. Enable the hardware sensors the part provides, add software control-flow and data integrity checks, keep a persistent fault counter, and define a proportionate, tested response. A device that detects, counts, and responds has changed the game from whether a glitch is possible to whether the attacker survives long enough to use one, which is a far better position to defend from.

Where This Fits

Assessing whether a device detects and responds to fault injection, and testing those countermeasures with a real glitch rig, is part of a hardware-focused product security assessment. If you want your fault-detection and tamper-response designed or validated, that is the kind of work we do at Berkner Tech.

security assessments

threat modeling

penetration testing

Secure Development

see all articles

free security classes

case studies

EU Cyber Resilience

about us

contact us

reviews & Testimonials

Detecting Fault Injection at Runtime

Detecting Fault Injection at Runtime

Detection Versus Resistance

The Detection Mechanisms

Hardware Fault Sensors

Software Fault Detection

The Fault Counter

Integrity Checks on Critical Data

Responding Proportionately

The Wipe-on-Tamper Response

Testing the Detection

Detection as Part of a Layered Defense

Where This Fits

Related on the Berkner Tech Blog

References and Further Reading

Share:

More Posts

ISO/SAE 21434 for Automotive Cybersecurity

CRA Important and Critical Product Classes, Explained

Building an SBOM for Embedded Firmware

The EU Cyber Resilience Act for Hardware Makers