Designing Firmware That Survives a Glitch

You cannot stop a glitch, but you can make one insufficient. Here are the firmware patterns that survive fault injection.

Designing Firmware That Survives a Glitch

Firmware design patterns that make a single fault-injection glitch insufficient to flip a security decision

You cannot keep an attacker from injecting a fault into your hardware. What you can do is write firmware so that a single fault is not enough to matter. Glitch-resistant code assumes something will go wrong at the worst possible moment and refuses to fail open, so one corrupted instruction does not hand over the device.

Accept That the Fault Will Happen

Hardware countermeasures, internal clocks, voltage monitors, glitch detectors, raise the cost of fault injection but do not make it impossible. A determined attacker with the right rig will eventually land a glitch on the instruction they want, so the realistic goal is not prevention, it is making that one successful glitch insufficient to achieve anything.

That also bounds the problem. A single glitch typically corrupts one operation: it skips an instruction, scrambles a register, or flips a branch, and the classic effect is skipping the instruction that acts on a failed check so the device proceeds as if the check passed. If you can ensure that no single skipped or corrupted instruction flips a security decision, you have defended against the common case, and forcing the attacker to land two or three precise faults at once raises the difficulty enormously.

The Defensive Patterns

Every pattern below works toward the same bound, that no single instruction is load-bearing for a security decision:

Pattern	Defends against	The technique
Fail closed	A skipped instruction leaving the device open	Default to locked; only a passed check opens it
Check more than once	One glitch skipping the single check	Verify critical conditions two or three times, in separated code
Redundant value encodings	One bit flip flipping a flag	Multi-bit patterns; any other value is a detected fault
Random delays	A fixed glitch offset landing every run	Jitter before sensitive ops to destroy the timing reference
Double-check the branch	A glitched branch taking the wrong path	Re-confirm the condition inside the success path
Fault counter	The many attempts tuning a glitch takes	Escalate, lock or wipe, after repeated detected faults

Fail Closed by Default

The safe outcome must be the default and the permissive outcome must require positive action. A glitch that skips code should leave the device locked, not unlocked, because skipping is the most common glitch effect and skipped code should never be what was keeping the device secure.

// fragile: the secure state depends on code running to set it
int authorized = 1;                 // default open
if (verify() != OK) authorized = 0; // a skipped instruction leaves it open
// resilient: default closed, only a passed check opens it
int authorized = 0;                 // default closed
if (verify() == OK) authorized = 1; // a skip leaves it closed

In the fragile version, a glitch that skips the line setting authorized to zero leaves the device open. In the resilient version, skipping the line that opens it leaves the device closed. Same logic, opposite failure mode, and the difference is entirely in which outcome is the default.

Check More than Once

A single check is a single point of failure for a glitch. Verifying a critical condition more than once, ideally in separated code so a glitch tuned to one check does not coincide with the others, means one fault skips at most one of the checks and the remaining ones still catch the attacker.

// verify twice, with work in between so one glitch hits only one check
if (verify_signature(img) != OK) lockdown();
process_unrelated_work();
if (verify_signature(img) != OK) lockdown();   // second, independent gate
boot(img);

A single value that says authorized is fragile; two or three independent confirmations, separated in time, are not, because a glitch lands on one moment and one instruction. The attacker would need to glitch each check precisely in the same run, which is far harder than landing a single fault.

Use Redundant Value Encodings

Critical flags should not be a simple zero or one, because a single flipped bit turns one valid value into the other. Representing a flag with a multi-bit pattern means a single corrupted bit produces an invalid value you can detect, rather than silently flipping authorized-false into authorized-true.

// fragile: one bit flip turns DENY (0) into something nonzero
// resilient: distinct multi-bit patterns; anything else is a detected fault
#define ST_GRANTED  0xA5C30F69
#define ST_DENIED   0x5A3CF096
if (state == ST_GRANTED)      grant();
else if (state == ST_DENIED)  deny();
else                          fault_detected();   // glitched value caught

With far-apart patterns, no single bit flip turns DENIED into GRANTED; it turns DENIED into a value that is neither, which the code treats as a detected fault and a reason to lock down. The encoding itself becomes a glitch detector for the value that matters most.

Add Random Delays

Fault injection depends on landing the glitch at a precise moment relative to the target instruction. Inserting random delays before sensitive operations moves that moment around from run to run, so an attacker cannot simply replay a fixed offset and has to search anew each time.

// randomize timing before the critical check so a fixed glitch offset misses
random_delay(rng_byte() & 0x3F);   // 0..63 cycles of jitter
if (verify_signature(img) != OK) lockdown();

Random delays do not stop a glitch, they make it less reliable by destroying the attacker’s timing reference. Combined with the other patterns, jitter raises the number of attempts needed to land a useful fault, which for many threat models pushes the attack past the point of practicality.

Double-Check the Branch, Not Just the Value

Attackers can glitch the branch instruction itself, not only the comparison that feeds it. Defensive code confirms it actually took the path it intended, for example by re-checking the condition inside the success path before doing anything irreversible, so a glitched branch is caught after the fact.

// confirm we are really on the success path before acting
if (verify(img) == OK) {
    if (verify(img) != OK) fault_detected();  // re-confirm inside the branch
    boot(img);
}

Re-confirming the condition inside the branch that was supposed to require it catches a glitch that forced the branch the wrong way. It feels redundant because it is, deliberately, and that redundancy is exactly what denies a single corrupted branch instruction the ability to carry the device into a state it should never have reached.

Detect, Respond, and Protect the Boot Chain

The patterns above do not just resist faults, they create opportunities to detect them: an impossible value, a control-flow path that should be unreachable, a mismatch between two checks, all signal that something injected a fault, and the firmware can respond by locking down, wiping secrets, or counting the event. A fault counter in non-volatile memory that escalates after repeated detected faults is a strong addition, because fault injection usually takes many attempts to tune, and a device that locks itself or wipes its keys after a pattern of faults turns the attacker’s necessary search into a self-defeating process.

Glitch resistance matters most in the boot chain, where the checks that decide whether to run code live. Each stage should verify the next with these redundant, fail-closed patterns, and the transitions between stages, the moments control is handed off, are exactly where a glitch is aimed, so they get the most defensive attention. A secure boot that is logically correct but reduces each verdict to a single branch is fragile by design, as glitch demonstrations against real devices repeatedly show.

Test Your Own Resistance

Designing for glitch resistance is incomplete without testing it. A fault-injection rig like a ChipWhisperer, pointed at your own device, reveals which checks fall to a single glitch and which hold, and treating that as part of pre-production testing finds the load-bearing single instructions before an attacker does. The test is the same one an attacker runs: sweep the glitch timing across the boot and the critical checks, and watch for any impossible outcome. A check that never yields to a single fault across a thorough sweep is doing its job; one that falls is a concrete bug to fix with the patterns here, verified by re-running the sweep.

Assume the Fault, Design for It

None of these patterns prevent a fault, and none is sufficient alone, but together they make a single successful glitch insufficient to flip a decision. When I review security-critical firmware, the question is not whether a glitch is possible, it is whether one well-placed glitch is enough, and good design ensures the answer is no.

Where This Fits

Reviewing firmware for fault-injection resistance, and testing it with a glitch rig the way an attacker would, is part of a hardware-focused product security assessment. If you want help hardening your boot chain and critical checks against faults, that is the kind of work we do at Berkner Tech.

security assessments

threat modeling

penetration testing

Secure Development

see all articles

free security classes

case studies

EU Cyber Resilience

about us

contact us

reviews & Testimonials

Designing Firmware That Survives a Glitch

Designing Firmware That Survives a Glitch

Accept That the Fault Will Happen

The Defensive Patterns

Fail Closed by Default

Check More than Once

Use Redundant Value Encodings

Add Random Delays

Double-Check the Branch, Not Just the Value

Detect, Respond, and Protect the Boot Chain

Test Your Own Resistance

Assume the Fault, Design for It

Where This Fits

Related on the Berkner Tech Blog

References and Further Reading

Share:

More Posts

ISO/SAE 21434 for Automotive Cybersecurity

CRA Important and Critical Product Classes, Explained

Building an SBOM for Embedded Firmware

The EU Cyber Resilience Act for Hardware Makers