Diffing Firmware Versions to Find Security Patches

Vendors often fix security bugs without saying so. Comparing the firmware before and after a patch shows you exactly what they changed, and a changed function is a strong hint about the vulnerability that used to be there. Done well, it turns a quiet update into a public disclosure.
Why Silent Patches Are Worth Chasing
A release note that says “bug fixes and improvements” hides as much as it reveals. If a security issue was fixed, the patch itself is the most precise description of the bug that exists, more precise than any advisory. Reading it tells you the bug class, the affected function, and whether your own fielded units are still exposed.
This is routine work in vulnerability research and in third-party component review. It is also how you verify that a fix you paid a vendor for is real rather than cosmetic, which is a question worth answering before you sign off on a release.
Line Up the Two Images
Extract both versions to the same point: the same filesystem, the same binaries on disk. A raw byte diff of whole images is useless because timestamps, build paths, and compression shift everything. Get to comparable files first with binwalk.
binwalk -e firmware_v1.bin
binwalk -e firmware_v2.bin
# then compare the two extracted root filesystems
diff -rq _firmware_v1.bin.extracted/squashfs-root \
_firmware_v2.bin.extracted/squashfs-rootFiles .../v1/bin/httpd and .../v2/bin/httpd differ Files .../v1/lib/libauth.so and .../v2/lib/libauth.so differ Only in .../v2/etc: patch_notes.txt
Now you have a short list. Two changed binaries, httpd and libauth.so, out of an otherwise identical filesystem. Those are where the interesting work is, and a security fix in something named libauth is hard to ignore.
Tools That Match Functions Across Builds
Binary diffing tools match functions between two builds by structure and flag the ones that changed, so they see through recompilation noise that a byte diff cannot. The open-source options are Diaphora, which drives IDA or Ghidra, and radare2’s radiff2. BinDiff is the well-known commercial choice. All three answer the same question: which functions changed, and how much.
The trick that makes them work is that they compare control-flow graphs, not raw bytes. A function that was recompiled but not changed scores as identical, while a function that gained a check scores as similar-but-different, which is exactly the signal you want.
Diff Functions, Not Bytes
A quick first pass with radiff2 gives a similarity score per function. Sort by score and the patched functions float to the top of your attention.
radiff2 -AC v1/lib/libauth.so v2/lib/libauth.so | sort -k1 -n | head
0.71200 check_session_token check_session_token 0.93400 parse_login parse_login 1.00000 hash_password hash_password 1.00000 load_users load_users
Similarity 1.00000 means unchanged. The function that dropped to 0.712, check_session_token, is the one to read. A big similarity drop in a function whose name involves sessions or auth is the brightest possible arrow at the bug.
Read What the Patch Added
Open the changed function in both versions side by side. Security patches have recognizable shapes. A new bounds test points to a buffer overflow. A new comparison or a constant-time check points to an authentication or token-handling gap. Here the patched version added a length guard the old one lacked.
// v1: no check, attacker controls len
memcpy(session->token, input, len);
// v2: bounded copy added by the patch
if (len > sizeof(session->token))
return ERR_BAD_REQUEST;
memcpy(session->token, input, len);That single added guard tells you the old build had a stack or heap overflow reachable through the session token. You now know the bug class, the function, and the input that reaches it, without the vendor ever publishing a word about it.
Confirm the Path Is Reachable
A changed function is a lead, not a finding, until you confirm an attacker can reach the vulnerable code. Trace the input backward from the vulnerable call to a network or serial entry point, and note whether authentication sits in the way.
# call path from the network handler down to the vulnerable copy (v1)
http_accept -> route_request -> /api/session -> parse_session_token -> memcpy(...)reachable pre-auth: the /api/session endpoint runs before login -> unauthenticated remote overflow in firmware v1
Reachability is what separates a critical from a low. The same bug behind authentication, or only callable locally, rates far lower, and the call path is what lets you say which one it is honestly rather than guessing.
Reconstruct the Bug
Reasoning backward from the fix to the original flaw is usually quick once you see the added check. A new length test means the field was unbounded. A new null check means a crash on absent input. A new permission test means a function anyone could call. Each fix names the bug it was patching.
Writing that reconstruction down, with the before-and-after snippets, is what makes the finding credible and actionable, whether you are reporting it to the vendor or deciding if your own fielded units are exposed and need an out-of-band update.
Why a Silent Patch Still Exposes the Fleet
A quiet security fix protects only the units that actually install it. The diff reveals the bug, and any device still running the old version remains exploitable with what is now a known issue. For products with slow, manual, or optional updates, that exposure window can last years.
The lesson cuts both ways. As a defender, assume attackers read your updates as carefully as your customers do, so the moment a fix ships, the bug it fixes is effectively public. That argues for fixes that do not depend on users opting in, and against sitting on a security patch while half the fleet stays vulnerable.
Make Diffing Part of Your Own Release Process
The same tooling an attacker uses is a quality gate you can run yourself. Diff each release against the previous one and confirm that the only changed functions are the ones you meant to change. An unexpected diff in a security-sensitive function is either an accidental regression or a supply-chain problem.
Catching that before shipping is far cheaper than reading about it later. A short diff report attached to each release, reviewed by someone who knows what changed and why, closes a gap that most teams never think to look at.
Normalize Before You Diff
Two builds of the same source can differ in ways that have nothing to do with a fix: different base addresses, reordered functions, changed build timestamps. Stripping that noise first keeps the diff focused on logic. Rebasing both binaries to the same load address and ignoring symbol churn does most of the work.
# rebase both to the same image base so addresses line up
radiff2 -AC -b v1/lib/libauth.so v2/lib/libauth.so | awk '$1<0.99'0.71200 check_session_token 0.88500 refresh_cookie # only two functions below the 0.99 threshold after normalization
After normalizing, the candidate list usually shrinks to a handful of functions. That short list is the whole point: it turns “something changed in a 2 MB binary” into “read these two functions.”
A Worked Example: an Off-by-One in the Parser
A concrete case makes the method click. Suppose refresh_cookie also dropped in similarity. Reading both versions shows the patch changed a comparison operator, the signature of an off-by-one.
// v1: writes one byte past the buffer on a full-length cookie for (i = 0; i <= len; i++) out[i] = in[i]; // v2: the patch fixed the bound for (i = 0; i < len; i++) out[i] = in[i];
A single character changed, <= became <, and that is the entire fix. The old loop wrote one byte past the end of out, a classic off-by-one that corrupts whatever sat next to it on the stack. The diff handed you the bug, the trigger condition, and the proof in three lines.
When the Diff Looks Huge
Sometimes the tool reports dozens of changed functions and the real fix is buried among them. That usually means the vendor changed compiler flags or libraries between releases, so everything looks a little different. The move is to filter by name and by the kind of change you expect, rather than reading all of them.
# focus on security-relevant names among many changed functions
radiff2 -AC v1/bin/httpd v2/bin/httpd | awk '$1<0.95' | grep -iE 'auth|parse|copy|len|token|verify|session'0.640 parse_content_length 0.812 verify_auth_header # the two functions worth reading, filtered from 40+ changed
Filtering on the vocabulary of vulnerabilities, the parse, copy, length, and auth functions, cuts a forty-function diff down to the two that matter. Compiler noise is real, but it rarely touches the function names you care about all at once.
Diffing the Kernel and Bootloader Too
Application binaries are the obvious target, but the same method applies to the kernel and the bootloader. A changed kernel module or a patched U-Boot is just as revealing, and those components are often the ones with the longest exposure windows because they update least often.
When the filesystem diff flags a .ko module or the bootloader region, treat it exactly like a changed application binary: match functions, read what the patch added, and reason back to the bug. The lower in the stack the fix sits, the more it tends to matter.
Turning a Diff Into a Field Advisory
The end product of a diff is rarely just “they fixed something.” It is a decision about your own fleet. Once you have the bug class, the affected function, and proof the path is reachable, you can state plainly which firmware versions are vulnerable and what an owner should do about it.
# confirm which fielded versions carry the vulnerable code
for v in v1 v2 v3; do
echo -n "$v: "; radiff2 -C $v/lib/libauth.so v2_fixed/lib/libauth.so | grep -c check_session_token
donev1: 1 vulnerable (function differs from fixed) v2: 1 vulnerable v3: 0 patched
That table is the advisory in miniature: versions one and two carry the flaw, version three does not. For a product team, that is the difference between a vague worry and a clear instruction to push an update to a known set of units. The diff did not just find the bug, it scoped the exposure.
Where This Fits
Patch diffing supports vulnerability research, third-party component assessment, and verifying that a fix you paid for is actually a fix. If you want to know what your updates reveal, or whether a vendor’s quiet patch left your fielded units exposed, that is the kind of work we do at Berkner Tech.