← Back to FPGA Algorithms
Unlike static sorting networks that apply a fixed set of comparisons in parallel,
streaming merge sort processes data one element per clock cycle through a cascade
of merge levels. Each level doubles the sorted run size: level 0 merges pairs of
single elements into runs of 2, level 1 merges runs of 2 into runs of 4, and so on.
FIFOs buffer data between levels, allowing the pipeline to sustain 1 element/cycle
throughput. End-of-stream markers allow sorting arbitrary-length sequences, not just
powers of two.
Properties
| Property | Value |
| Architecture | Pipelined merge tree (log2 N levels) |
| Throughput | 1 element / clock cycle (sustained) |
| Latency | O(N log N) cycles for N elements |
| Stream sizes | Arbitrary (end-of-stream support) |
| FPGA resources | 3,171 LUTs, 3,808 FFs, 14 BRAM36 + 4 BRAM18 |
| Max frequency | 300 MHz (ZU9EG) |
| Best for | Large streaming datasets, variable-length sequences |
Sorting Networks vs Streaming Merge Sort
| Aspect | Sorting Networks | Streaming Merge Sort |
| Data flow | All elements in parallel | One element per cycle |
| Comparisons | Fixed, data-independent | Data-dependent merge |
| Memory | Registers only | FIFOs (BRAM) |
| Scalability | Area grows as O(N log2 N) | Area grows as O(N log N) |
| Best size range | 4 – 64 elements | 64 – 10,000+ elements |
View on GitHub