LZ77 factorization algorithms

[back to the main page]

This webpage is devoted to algorithms computing Lempel-Ziv factorization (also known as Lempel-Ziv or LZ77 parsing). We present C++ implementations of different algorithms varying in speed and space consumption.


Name Spacea Complexityb Comments Paper Code
KKP3 13n O(n) The fastest algorithm if the text is not highly repetitive. [1][2] download
KKP2 9n O(n) The most space-efficient O(n)-time algorithm for integer alphabet.
KKP1s 5n O(n) A version of KKP2 that streams the suffix array from the disk.
ISA9 9n O(n log σ) One of the fastest algorithms at this memory usage. [1][3] download
ISA6s 6n O(n log σ) Slower, but more space-efficient version of ISA9.
ISA6r 6n O(n log σ) A version of ISA6s specialized for highly repetitive inputs.
LZscan n+O(n/d) O(nd × log(n/d)) Parameter d controls the time/space trade-off. [4] download

a Summarizes the practical space consumption assuming 32-bit integers and byte alphabet.
b Assuming the input text is of length n and consists of symbol drawn from an alphabet of size σ.

News

See also

References

  1. Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi. Lazy Lempel-Ziv Factorization Algorithms.
    ACM Journal of Experimental Algorithmics (JEA), Volume 21(1), 2016, Article 2.4.
    [ACM]
  2. Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi. Linear Time Lempel-Ziv Factorization: Simple, Fast, Small.
    In Proc. 24th Symposium on Combinatorial Pattern Matching (CPM 2013), Springer, 2013, pp. 189-200.
    [Springer]
  3. Dominik Kempa, Simon J. Puglisi. Lempel-Ziv Factorization: Simple, Fast, Practical.
    In Proc. 2013 Algorithm Engineering and Experimentation (ALENEX 2013), SIAM, 2013, pp. 103-112.
    [SIAM]
  4. Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi. Lightweight Lempel-Ziv Parsing.
    In Proc. 12th Symposium on Experimental Algorithms (SEA 2013), Springer, 2013, pp. 139-150.
    [Springer]