Designing Firmware That Survives a Glitch

You cannot keep an attacker from injecting a fault into your hardware. What you can do is write firmware so that a single fault is not enough to matter. Glitch-resistant code assumes something will go wrong at the worst possible moment and refuses to fail open, so one corrupted instruction does not hand over the device.
Accept That the Fault Will Happen
Hardware countermeasures, internal clocks, voltage monitors, glitch detectors, raise the cost of fault injection but do not make it impossible. A determined attacker with the right rig will eventually land a glitch on the instruction they want. The realistic goal is not prevention, it is making that one successful glitch insufficient to achieve anything.
That mindset shift is the whole subject. Once you assume a fault can and will skip or corrupt one instruction at the worst moment, you start writing security decisions so that no single instruction is load-bearing. The firmware stops being a sequence of trusting steps and becomes a structure that tolerates one of its steps being sabotaged.
What One Glitch Actually Does
A single glitch typically corrupts one operation: it skips an instruction, scrambles a register, or flips a branch. The classic and most useful effect for an attacker is skipping the instruction that acts on a failed check, so the device proceeds as if the check passed. Understanding that the unit of damage is usually one instruction is what makes the defenses concrete.
It also bounds the problem. If you can ensure that no single skipped or corrupted instruction flips a security decision, you have defended against the common case, and forcing the attacker to land two or three precise faults at once raises the difficulty enormously. The patterns below all work toward that bound.
Fail Closed by Default
The first principle is that the safe outcome must be the default and the permissive outcome must require positive action. A glitch that skips code should leave the device locked, not unlocked, because skipping is the most common glitch effect and skipped code should never be what was keeping the device secure.
// fragile: the secure state depends on code running to set it int authorized = 1; // default open if (verify() != OK) authorized = 0; // a skipped instruction leaves it open // resilient: default closed, only a passed check opens it int authorized = 0; // default closed if (verify() == OK) authorized = 1; // a skip leaves it closed
In the fragile version, a glitch that skips the line setting authorized to zero leaves the device open. In the resilient version, skipping the line that opens it leaves the device closed. Same logic, opposite failure mode, and the difference is entirely in which outcome is the default.
Check More Than Once
A single check is a single point of failure for a glitch. Verifying a critical condition more than once, ideally in separated code so a glitch tuned to one check does not coincide with the others, means one fault skips at most one of the checks and the remaining ones still catch the attacker.
// verify twice, with work in between so one glitch hits only one check if (verify_signature(img) != OK) lockdown(); process_unrelated_work(); if (verify_signature(img) != OK) lockdown(); // second, independent gate boot(img);
A single value that says authorized is fragile; two or three independent confirmations, separated in time, are not, because a glitch lands on one moment and one instruction. The attacker would need to glitch each check precisely in the same run, which is far harder than landing a single fault.
Use Redundant Value Encodings
Critical flags should not be a simple zero or one, because a single flipped bit turns one valid value into the other. Representing a flag with a multi-bit pattern means a single corrupted bit produces an invalid value you can detect, rather than silently flipping authorized-false into authorized-true.
// fragile: one bit flip turns DENY (0) into something nonzero // resilient: distinct multi-bit patterns; anything else is a detected fault #define ST_GRANTED 0xA5C30F69 #define ST_DENIED 0x5A3CF096 if (state == ST_GRANTED) grant(); else if (state == ST_DENIED) deny(); else fault_detected(); // glitched value caught
With far-apart patterns, no single bit flip turns DENIED into GRANTED; it turns DENIED into a value that is neither, which the code treats as a detected fault and a reason to lock down. The encoding itself becomes a glitch detector for the value that matters most.
Add Random Delays
Fault injection depends on landing the glitch at a precise moment relative to the target instruction. Inserting random delays before sensitive operations moves that moment around from run to run, so an attacker cannot simply replay a fixed offset and has to search anew each time.
// randomize timing before the critical check so a fixed glitch offset misses random_delay(rng_byte() & 0x3F); // 0..63 cycles of jitter if (verify_signature(img) != OK) lockdown();
Random delays do not stop a glitch, they make it less reliable by destroying the attacker’s timing reference. Combined with the other patterns, jitter raises the number of attempts needed to land a useful fault, which for many threat models pushes the attack past the point of practicality.
Double-Check the Branch, Not Just the Value
Attackers can glitch the branch instruction itself, not only the comparison that feeds it. Defensive code confirms it actually took the path it intended, for example by re-checking the condition inside the success path before doing anything irreversible, so a glitched branch is caught after the fact.
// confirm we are really on the success path before acting
if (verify(img) == OK) {
if (verify(img) != OK) fault_detected(); // re-confirm inside the branch
boot(img);
}Re-confirming the condition inside the branch that was supposed to require it catches a glitch that forced the branch the wrong way. It feels redundant because it is, deliberately, and that redundancy is exactly what denies a single corrupted branch instruction the ability to carry the device into a state it should never have reached.
Detect and Respond to Faults
The patterns above do not just resist faults, they create opportunities to detect them. An impossible value, a control-flow path that should be unreachable, a mismatch between two checks, all signal that something injected a fault, and the firmware can respond by locking down, wiping secrets, or counting the event.
A fault counter in non-volatile memory that escalates after repeated detected faults is a strong addition, because fault injection usually takes many attempts to tune. A device that locks itself, or wipes its keys, after detecting a pattern of faults turns the attacker’s necessary search into a self-defeating process.
Protect the Whole Boot Chain
Glitch resistance matters most in the boot chain, where the checks that decide whether to run code live. Each stage should verify the next with the redundant, fail-closed patterns, and the transitions between stages, the moments control is handed off, are exactly where a glitch is aimed, so they get the most defensive attention.
A secure boot that is logically correct but reduces each verdict to a single branch is fragile by design, as glitch demonstrations against real devices repeatedly show. Applying these patterns to every verify-then-continue decision in the boot path is what turns a checkbox secure boot into one that survives an attacker with a glitch rig.
Test Your Own Resistance
Designing for glitch resistance is incomplete without testing it. A fault-injection rig like a ChipWhisperer, pointed at your own device, reveals which checks fall to a single glitch and which hold. Treating that as part of pre-production testing finds the load-bearing single instructions before an attacker does.
The test is the same one an attacker runs: sweep the glitch timing across the boot and the critical checks, and watch for any impossible outcome. A check that never yields to a single fault across a thorough sweep is doing its job. One that falls is a concrete bug to fix with the patterns here, verified by re-running the sweep.
Assume the Fault, Design for It
Glitch-resistant firmware starts from the premise that you cannot stop an attacker from injecting a fault, so the code must make one fault harmless. Fail closed so skipped instructions leave the device locked, encode critical flags as multi-bit patterns so a flipped bit is detectable, check important conditions more than once in separated code, and add random delays so the attacker cannot reliably time the glitch.
None of these prevent a fault, and none is sufficient alone, but together they make a single successful glitch insufficient to flip a decision. When I review security-critical firmware, the question is not whether a glitch is possible, it is whether one well-placed glitch is enough, and good design ensures the answer is no.
Where This Fits
Reviewing firmware for fault-injection resistance, and testing it with a glitch rig the way an attacker would, is part of a hardware-focused product security assessment. If you want help hardening your boot chain and critical checks against faults, that is the kind of work we do at Berkner Tech.