Transposition Invariant String Matching
Veli Mäkinen, Gonzalo Navarro and Esko Ukkonen
Given strings A and B over an alphabet S subset of U
where U is some numerical universe closed under addition and subtraction,
and a distance function d(A,B) that gives the score of the best (partial)
matching of A and B, the transposition invariant distance is
min {d(A+t,B), t in U} where A+t = (a(1)+t)(a(2)+t)...(a(m)+t).
We study the problem of computing the transposition invariant distance for
various distance (and similarity) functions d, including Hamming
distance, longest common subsequence (LCS), Levenshtein
distance, and their versions where the exact matching condition is replaced
by an approximate one.
For all these problems we give algorithms whose time complexities are close
to the known upper bounds without transposition invariance, and for some we
achieve these upper bounds. In particular, we
show how sparse dynamic programming can be used to solve transposition
invariant problems, and its connection with multidimensional range-minimum
search. As a byproduct, we give improved sparse dynamic programming algorithms
to compute LCS and Levenshtein distance.