Matching Numeric Strings under Noise
Veli Mäkinen, Gonzalo Navarro and Esko Ukkonen
Abstract.
Numeric string is a sequence of symbols from an alphabet Sigma
subset of U, where U is some numerical universe closed under
addition and subtraction. Given two numeric strings A=a(1)...a(m) and
B=b(1) \cdots b(n) and a distance function d(A,B) that gives the
score of the best (partial) matching of A and B, the
transposition invariant distance is min {d(A+t,B), t in U}, where
A+t = (a(1)+t)(a(2)+t)...(a_(m)+t). The corresponding matching
problem is to find occurrences j of A in B where
d(A+t,B(j'...j)) is smaller than some given threshold and
B(j'...j) is a substring of B.
In this paper, we give efficient algorithms for matching numeric strings ---
with and without transposition invariance --- under noise; we consider
distance functions d(A,B) such that symbols a in A and
b in B can be matched if |b-a| <= delta, or the
kappa largest differences |b-a| can be discarded.