Thursday, March 17 2016
14:00 - 15:00

Room 327

Indexing Parameterized Text Succinctly

Sharma V. Thankachan

Georgia Institute of Technology

The fields of succinct data structures and compressed text indexing have
seen quite a bit of progress over the last 15 years. An important
achievement, primarily using techniques based on the Burrows-Wheeler
Transform (BWT), was that of obtaining full functionality of suffix tree in
the optimal number of bits.
In some of the variants to the original text indexing problem, either
suffix links are not defined or the order preserving property of suffix
links does not hold.
Specifically, the relative ordering between two suffixes in the subtree of
an internal node and that of the suffixes obtained by truncating the first
character of these suffixes are not necessarily the same. For such
variants, the BWT does not translate to a text index, and throughout the
advancement of the field of succinct data structures, such problems have
remained largely unsolved.
We achieve a positive breakthrough on one such problem, namely the
Parameterized Pattern Matching problem.

In parameterized matching, a pattern matches some location in the text if
there is one-to-one correspondence between the alphabet symbols of pattern
to those of the text. More specifically, assume that the text $\T$ contains
$n$ characters from a static alphabet $\Sigma_s$ and a parameterized
alphabet $\Sigma_p$, where $\Sigma_s \cap \Sigma_p = \varnothing$ and
$|\Sigma_s \cup \Sigma_p|=\sigma$. A pattern $P$ matches a substring $S$
of $\T$ iff the static characters match exactly, and there exists a
one-to-one function that renames the parameterized characters in $S$ to
that in $P$. Previous indexing solution [Baker, STOC 1993], known as
Parameterized Suffix Tree, requires $O(n\log n)$ bits of space, and can
find all $occ$ occurrences of a pattern $P$ in $O(|P|\log \sigma+ occ)$
time. In this paper, we present the first succinct index that occupies $n
\log \sigma + O(n)$ bits of space and answers queries in $O(|P|\log\sigma+
occ\log n)$ time.



Download as iCalendar

Done