Streaming Merge Sort

Pipelined merge tree for streaming data

Unlike static sorting networks that apply a fixed set of comparisons in parallel, streaming merge sort processes data one element per clock cycle through a cascade of merge levels. Each level doubles the sorted run size: level 0 merges pairs of single elements into runs of 2, level 1 merges runs of 2 into runs of 4, and so on. FIFOs buffer data between levels, allowing the pipeline to sustain 1 element/cycle throughput. End-of-stream markers allow sorting arbitrary-length sequences, not just powers of two.

Properties

PropertyValue
ArchitecturePipelined merge tree (log2 N levels)
Throughput1 element / clock cycle (sustained)
LatencyO(N log N) cycles for N elements
Stream sizesArbitrary (end-of-stream support)
FPGA resources3,171 LUTs, 3,808 FFs, 14 BRAM36 + 4 BRAM18
Max frequency300 MHz (ZU9EG)
Best forLarge streaming datasets, variable-length sequences

Sorting Networks vs Streaming Merge Sort

AspectSorting NetworksStreaming Merge Sort
Data flowAll elements in parallelOne element per cycle
ComparisonsFixed, data-independentData-dependent merge
MemoryRegisters onlyFIFOs (BRAM)
ScalabilityArea grows as O(N log2 N)Area grows as O(N log N)
Best size range4 – 64 elements64 – 10,000+ elements
View on GitHub

Resources