U N I V E R S I T Y O F H E L S I N K
I
D E P A R T M E N T O F C O M P U T E R S C I E N C E |
and
Date Thursday, September 11th Place
   Department of Computer Science,
Teollisuuskatu 23 room A414Time 14:15 - 16
A. Amir: Pattern Matching with SwapsLet a text string $T$ of $n$ symbols and a pattern string $P$ of $m$ symbols from alphabet $\Sigma$ be given. A {\em swapped version} $T'$ of $T$ is a length $n$ string derived from $T$ by a series of {\em local swaps}, (i.e.\ $t'_{\ell} \leftarrow t_{\ell +1}$ and $t'_{\ell +1} \leftarrow t_{\ell}$) where each element can participate in {\em no more than one swap}.
The {\em Pattern Matching with Swaps} problem is that of finding all locations $i$ for which there exists a swapped version $T'$ of $T$ where there is an exact matching of $P$ in location $i$ of $T'$. It has been an open problem whether swapped matching can be done in less than $O(mn)$ time.
We show the first algorithm that solves the pattern matching with swaps problem in time $o(mn)$. We present an algorithm whose time complexity is $O(n m^{1/3} \log m \log^2 \sigma)$ for a general alphabet $\Sigma$, where $\sigma = min(m,|\Sigma|)$.
G. Landau: Matching for Run-Length Encoded StringsMeasuring the similarity between two strings, through such standard measures as Hamming distance, edit distance, and longest common subsequence, is one of the fundamental problems in pattern matching. We consider the problem of finding the longest common subsequence of two strings. A well-known dynamic programming algorithm computes the longest common subsequence of strings $X$ and $Y$ in $O(|X| \cdot |Y|)$ time. In this paper, we develop significantly faster algorithms for a special class of strings which emerge frequently in pattern matching problems.
A string $S$ is run-length encoded if it is described as an ordered sequence of pairs $(\sigma,i)$, each consisting of an alphabet symbol $\sigma$ and an integer $i$. Each pair corresponds to a run in $S$ consisting of $i$ consecutive occurrences of $\sigma$. Such a run-length encoded string can be significantly shorter than the expanded string representation.
In this paper, we present the first algorithm which finds the longest common subsequence of strings $X$ and $Y$ in time polynomial in the size of the compressed strings. Our final algorithm runs in $O(k l \log ( k l ) )$ time, where $k$ and $l$ are the compressed lengths of strings $X$ and $Y$.
(Joint work with Alberto Apostolico and Steven Skiena)