LZ77 factorization algorithms
[back to the main page]This webpage is devoted to algorithms computing Lempel-Ziv factorization (also known as Lempel-Ziv or LZ77 parsing). We present C++ implementations of different algorithms varying in speed and space consumption.
Name | Spacea | Complexityb | Comments | Paper | Code |
KKP3 | 13n | O(n) | The fastest algorithm if the text is not highly repetitive. | [1][2] | download |
KKP2 | 9n | O(n) | The most space-efficient O(n)-time algorithm for integer alphabet. | ||
KKP1s | 5n | O(n) | A version of KKP2 that streams the suffix array from the disk. | ||
ISA9 | 9n | O(n log σ) | One of the fastest algorithms at this memory usage. | [1][3] | download |
ISA6s | 6n | O(n log σ) | Slower, but more space-efficient version of ISA9. | ||
ISA6r | 6n | O(n log σ) | A version of ISA6s specialized for highly repetitive inputs. | ||
LZscan | n+O(n/d) | O(nd × log(n/d)) | Parameter d controls the time/space trade-off. | [4] | download |
a Summarizes the practical space consumption assuming 32-bit integers and byte alphabet.
b Assuming the input text is of length n and consists of symbol drawn from an alphabet of size σ.
News
- 2014-01-16: Fixed a bug in ISA6s algorithm reported here.
See also
References
-
Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi.
Lazy Lempel-Ziv Factorization Algorithms.
ACM Journal of Experimental Algorithmics (JEA), Volume 21(1), 2016, Article 2.4.
[ACM] -
Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi.
Linear Time Lempel-Ziv Factorization: Simple, Fast, Small.
In Proc. 24th Symposium on Combinatorial Pattern Matching (CPM 2013), Springer, 2013, pp. 189-200.
[Springer] -
Dominik Kempa, Simon J. Puglisi.
Lempel-Ziv Factorization: Simple, Fast, Practical.
In Proc. 2013 Algorithm Engineering and Experimentation (ALENEX 2013), SIAM, 2013, pp. 103-112.
[SIAM] -
Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi.
Lightweight Lempel-Ziv Parsing.
In Proc. 12th Symposium on Experimental Algorithms (SEA 2013), Springer, 2013, pp. 139-150.
[Springer]