vmakinen@cs.Helsinki.FI
Department of Computer Science, P.O Box 26 (Teollisuuskatu 23)
FIN-00014 University of Helsinki, Finland.
Abstract. Suffix array is a widely used full-text index that allows fast searches
on the text. It is constructed by sorting all suffixes of the text in the
lexicographic order and storing pointers to the suffixes in this order.
Binary search is used for fast searches on the suffix array.
Compact suffix array is a compressed form of the suffix array that
still allows binary searches, but the search times are also dependent on
the compression. In this paper, we give efficient methods for constructing and
querying compact suffix arrays. We also study practical issues, such as the trade off
between compression and search times, and show how to reduce the space requirement
of the construction. Experimental results are provided in comparison with other
search methods. With a large text corpora, the index took $1.6$ times the size of the text, while the
searches were only two times slower than from a suffix array.