Streaming Merge Sort - FPGA Algorithms

Unlike static sorting networks that apply a fixed set of comparisons in parallel, streaming merge sort processes data one element per clock cycle through a cascade of merge levels. Each level doubles the sorted run size: level 0 merges pairs of single elements into runs of 2, level 1 merges runs of 2 into runs of 4, and so on. FIFOs buffer data between levels, allowing the pipeline to sustain 1 element/cycle throughput. End-of-stream markers allow sorting arbitrary-length sequences, not just powers of two.

Size:

Properties

Property	Value
Architecture	Pipelined merge tree (log₂ N levels)
Throughput	1 element / clock cycle (sustained)
Latency	O(N log N) cycles for N elements
Stream sizes	Arbitrary (end-of-stream support)
FPGA resources	3,171 LUTs, 3,808 FFs, 14 BRAM36 + 4 BRAM18
Max frequency	300 MHz (ZU9EG)
Best for	Large streaming datasets, variable-length sequences

Sorting Networks vs Streaming Merge Sort

Aspect	Sorting Networks	Streaming Merge Sort
Data flow	All elements in parallel	One element per cycle
Comparisons	Fixed, data-independent	Data-dependent merge
Memory	Registers only	FIFOs (BRAM)
Scalability	Area grows as O(N log² N)	Area grows as O(N log N)
Best size range	4 – 64 elements	64 – 10,000+ elements

View on GitHub

Properties

Sorting Networks vs Streaming Merge Sort

Resources