AbstractWe investigate the time complexity of the pattern matching problem for strings which are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation. Most strings of descriptive size n are of exponential length with respect to n. A complicated algorithm for the equality-test (testing if two succinctly described strings are the same) in O(n4) time was constructed by Plandowski. Our main aim is to extend this result and we show that a much stronger problem of the pattern-matching for succinctly described strings can be solved with similar complexity: in O(n4 log n) time. However, Plandowski's algorithm is not used in this paper, and our algorithm gives independent constructive proof. The crucial point in our algorithm is the succinct representation of all periods of a (possibly long) string described in this manner. We also show a (rather straightforward) result that a very simple extension of the pattern-matching problem for shortly described strings is NP-complete.
Categories and Subject Descriptors: E.4 [Coding and Information Theory]; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems
Additional Key Words and Phrases: pattern matching, text search, compression, straight-line program
Selected references
- Martin Farach and Mikkel Thorup. String matching in Lempel-Ziv compressed strings. In Proceedings of the Twenty-Seventh Annual ACM Symposium on the Theory of Computing, pages 703-712, Las Vegas, Nevada, 29 May-1 June 1995.