Berkner Tech

Tapping a Parallel Memory Bus

Tapping a wide parallel memory bus to recover firmware from address and data lines

Serial flash sends data on one line, which makes it easy to read. Parallel memory moves many bits at once across address and data buses, which is faster for the device and much harder for an attacker. It still gives up its secrets with enough channels and care.

When Data Moves on Many Wires

A serial SPI flash is a four-wire conversation, and a cheap clip plus a programmer reads it in minutes. Parallel memory is a different problem. It spreads an address across a dozen or more pins, returns data across eight or sixteen more, and coordinates the whole thing with control signals.

That width is a performance choice, not a security one, but it has a security side effect: capturing it is far more work than reading a serial part. Understanding why, and the shortcut that usually beats it, is the point here.

Why Parallel Is Harder

To read a parallel bus live you have to capture the address lines and the data lines at the same instant, then line them up to know which address produced which data. A 16-bit-wide flash with a 20-bit address is over thirty signals before control lines, far past a four-channel hobby analyzer.

The timing is unforgiving too. The data is valid only during a narrow window relative to the control strobes, so the capture has to be both wide and fast. This is the wall that stops casual attempts at a parallel bus.

Identify the Parallel Part First

Confirm you are actually looking at parallel memory before committing to the hard path. A part with 40 or more pins, an address bus, a separate data bus, and chip-enable, output-enable, and write-enable control lines is parallel NOR or NAND. The datasheet’s pinout settles it.

# markings and pinout point to a parallel NOR flash
M29F040B  ->  4 Mbit parallel NOR, 32-pin
# A0-A18 address, DQ0-DQ7 data, plus CE#, OE#, WE#
Example output
-> 19 address lines + 8 data lines + 3 control = 30 signals to capture live
-> or: remove the chip and read it in a programmer (far simpler)

Thirty signals to capture in sync is the live-tap cost. Seeing that number is usually the moment you start considering whether removing the chip is the smarter move, which it often is.

The Channel-Count Problem

Capturing this needs a logic analyzer with enough channels and enough speed to sample every line through the access window. That is a real instrument, not a hobby board, and setting up thirty-plus probe connections on a dense board is finicky and error-prone.

# a high-channel analyzer configured for a parallel capture
sigrok-cli -d fx2lafw --channels A0-A18,DQ0-DQ7,CE,OE,WE \
  --config samplerate=24MHz --samples 5M -o capture.sr
Example output
saved 5M samples across 30 channels to capture.sr
# now the hard part: correlate address to data to rebuild the memory map

Even with the capture in hand, the raw samples are not firmware. They are millions of bus transactions that still have to be turned back into an ordered image, which is the next problem.

Correlating Address and Data

The value of a parallel capture is matching the address being driven to the data that comes back during each read. Post-processing walks the capture, samples the data bus on each output-enable strobe, and writes the byte to the address that was present.

# pseudo-reconstruction: on each OE# strobe, record data at the current address
for each falling_edge(OE):
    addr = read_bits(A0..A18)
    data = read_bits(DQ0..DQ7)
    image[addr] = data
Example output
reconstructed 4 Mbit image, 92% of addresses observed
# gaps where the CPU never read certain regions during capture

A live capture only ever sees the addresses the device actually read while you watched. Regions the firmware never touched stay blank, so a live tap often yields a partial image, which is another reason the direct read tends to win.

The Desolder Shortcut

More often than not, removing the chip and reading it in a dedicated programmer is easier than tapping a live bus. A socketed part gives a complete, clean image with none of the timing headaches and none of the coverage gaps a live capture suffers from.

Hot air and a little practice lift a parallel chip onto an adapter without damage. The trade is that the device stops running, which matters only if you specifically needed to watch live accesses. For recovering the firmware, the static read is simpler and more complete.

Reading It in a Programmer

A universal programmer that supports the part reads the whole chip over its own controlled bus, no correlation required. The result is a full image in one operation, the parallel equivalent of dropping a SPI chip into a CH341A.

# read a parallel NOR chip in a universal programmer
minipro -p "M29F040B@DIP32" -r dump.bin
Example output
Found Microchip M29F040B
Reading Code...  4.19s
Verification OK
# complete 512 KB image, every address, no gaps

One command, a complete image, every address present. That is why the socketed read usually beats the live tap: it is faster to set up, it cannot miss a region, and it does not fight the bus timing.

NOR Versus NAND on a Parallel Bus

Parallel NOR reads back as a clean linear image. Parallel NAND is messier, because it carries spare areas, ECC bytes, and bad-block markers alongside the data, so a raw dump needs post-processing before a filesystem appears.

Knowing which you have changes the plan. NOR is read-and-go. NAND means a reader that understands page-and-spare layout, then stripping ECC and reassembling pages before tools recognize anything. The extra step is the same whether you tapped the bus or read the chip.

What It Reveals

The prize is the same as any flash read: the bootloader, the firmware, the configuration, and any secrets stored alongside them. The wider bus changed the effort to get there, not the value of what is there.

That is the core point for an attacker and a defender alike. A parallel layout is a performance and cost decision that happens to raise the bar for extraction, but it is not a security control, and the firmware behind it is exactly as exposed as the protections around it allow.

Choosing the Easier Path

Faced with a parallel bus, the practical decision is whether to tap it live or remove the chip and read it in a programmer. Tapping preserves the running system and shows which addresses are accessed, which a few analyses need. Desoldering is slower to set up but yields a clean, complete image.

Most of the time the chip in a socket wins, because the goal is the firmware, not the spectacle of capturing a fast bus. The wider bus raises the effort but rarely changes the outcome.

What This Means for Defenders

A parallel bus should not be treated as protection. If the firmware on a parallel part is sensitive, it needs the same controls as anything else: encryption, a key kept off the readable medium, and verified boot so a modified image will not run.

Betting on the difficulty of tapping a wide bus is betting that an attacker will not simply lift the chip, which they will. The effort is a speed bump, and a determined attacker drives over it with a programmer.

Estimating the Effort Before You Start

A quick estimate keeps you from committing to the hard path needlessly. Count the address and data lines from the datasheet, add the control signals, and compare against your analyzer’s channel count and speed. If the signal count exceeds your gear, or the bus runs faster than you can sample cleanly, the live tap is off the table before you solder a single probe.

# rough feasibility check for a live parallel capture
addr_lines=19; data_lines=8; control=3
channels_needed=$((addr_lines + data_lines + control))   # = 30
echo "need $channels_needed channels at >= 2x bus clock to capture cleanly"
Example output
need 30 channels at >= 2x bus clock to capture cleanly
# a typical 16-channel analyzer cannot do this -> read the chip in a programmer

When the arithmetic says thirty channels and you own sixteen, the decision is made: remove the chip and read it in a programmer. Doing this estimate first turns a frustrating, doomed live-tap attempt into a five-minute calculation that points you at the path that actually works.

What a Parallel Layout Means for Defenders

For a product team, the lesson is that a parallel bus is not a security feature, even though it raises an attacker’s effort. The firmware on a parallel part is exactly as exposed as the controls around it allow, because an attacker who declines the hard live tap simply removes the chip and reads it in a programmer instead.

So a sensitive firmware image on parallel memory needs the same protections as one on serial flash: encryption with a hardware-held key, and verified boot so a modified image will not run. Treating the wide bus as a deterrent is a mistake, because the desolder-and-read shortcut neutralizes it, and the real defenses are the cryptographic ones that do not depend on how the bytes are wired.

Where This Fits

Assessing how exposed a device’s storage is, parallel or serial, is a core part of a hardware penetration test. If you want a clear read on what your firmware and stored secrets give away, that is the kind of work we do at Berkner Tech.


References and Further Reading