Better external memory (P)LCP array construction
[back to the main page]We present a C++ implementation of the two new external memory algorithms, called EM-SparsePhi and EM-SuccinctIrreducible, constructing the LCP array given the text and its suffix array (and for EM-SuccinctIrreducible, the Burrows-Wheeler transform (BWT)) as input.
- Both algorithms are parallelized using OpenMP (the provided Makefile produces both sequential and parallel versions). The new sequential algorithms are two times faster than previously fastest algorithm and the parallel versions improve the speed up to another factor of two.
- Both algorithms are in-place, i.e., the input (text and suffix array, and for EM-SuccinctIrreducible, the Burrows-Wheeler transform) is treated as read-only and the working disk space never exceeds the size of the final output (the LCP array), i.e., to run the algorithms it is sufficient that there is enough space to accommodate the output LCP array.
- Both algorithms work for inputs over large alphabets. While it is possible to split large characters into multiple bytes, construct suffix and LCP arrays for the resulting text over the byte alphabet, and then post-process to construct the desired arrays, this requires much more time and disk space than using algorithms that can handle large alphabets natively.
- Internally, EM-SuccinctIrreducible computes the LCP array by first computing the PLCP bitvector, and then converting it into LCP array. For applications, where the PLCP bitvector is sufficient, we provide a separate program that can compute the PLCP bitvector. This allows significant time reduction, compared to the LCP array computation.
Details of the
algorithms and experimental evaluation can be found in
[1] and [2].
See the README provided in the packages for more information.
News
- 2017-06-04: New versions (0.2.0) of algorithms available
- 2016-08-20: Initial release
Downloads
- EM-SparsePhi 0.2.0 (current version)
- EM-SparsePhi 0.1.0
- EM-SuccinctIrreducible 0.2.0 (current version)
- EM-SuccinctIrreducible 0.1.0
Notes
- Tested on Linux/PC.
- No external dependencies (libraries, cmake, etc).
Implemented by
- Juha Kärkkäinen
- Dominik Kempa
See also
- Parallel external memory suffix array construction
- Lightweight external memory suffix array construction algorithm
References
-
Juha Kärkkäinen, Dominik Kempa.
Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet.
In Proc. 16th International Symposium on Experimental Algorithms (SEA 2017), LIPIcs, 2017, pp. 17:1-17:14.
[LIPIcs] -
Juha Kärkkäinen, Dominik Kempa.
Faster External Memory LCP Array Construction.
In Proc. 24th European Symposium on Algorithms (ESA 2016), LIPIcs, 2016, pp. 61:1-61:16.
[LIPIcs]