On compressing and indexing data

Ferragina, Paolo and Manzini, Giovanni (2002) On compressing and indexing data. Technical Report del Dipartimento di Informatica . Università di Pisa, Pisa, IT.

Postscript (GZip) - Published Version
Available under License Creative Commons Attribution No Derivatives.
Download (228Kb)

Official URL: http://compass2.di.unipi.it/TR/files/TR-02-01.ps.g...

Abstract

In this paper we design two compressed data structures for the full-text indexing problem. These data structures support efficient substring searches within the indexed text $T$ using roughly the same space required to store $T$ in compressed form. Our first compressed data structure retrieves the $occ$ occurrences of a pattern $P[1,p]$ in $T[1,n]$ in $O(p + occ\log^{1+\epsilon} n)$ time and uses at most $5n H_k(T) + o(n)$ bits of storage, where $H_k(T)$ is the $k$-th order empirical entropy of $T$. This space occupancy is $Theta(n)$ bits in the worst case and $o(n)$ bits for compressible texts. Our data structure exploits the relationship between the suffix array and the Burrows-Wheeler compression algorithm. Our second compressed data structure achieves $O(p+occ)$ query time and uses $O(n H_k(T)\log^\epsilon n) + o(n)$ bits of storage. In the worst case the space occupancy is $o(n\log n)$ bits which is asymptotically smaller than the space occupancy of suffix trees and suffix arrays. This second data structure exploits the interplay between two compressors: the Burrows-Wheeler algorithm and the LZ78 algorithm.

Item Type:	Book
Uncontrolled Keywords:	Full-text indexing, Text compression, Pattern searching, Indexing data structure, Suffix tree, Suffix array, Burrows-Wheeler transform, Lempel-Ziv algorithms.
Subjects:	Area01 - Scienze matematiche e informatiche > INF/01 - Informatica
Divisions:	Dipartimenti (until 2012) > DIPARTIMENTO DI INFORMATICA
Depositing User:	dott.ssa Sandra Faita
Date Deposited:	17 Dec 2014 16:58
Last Modified:	17 Dec 2014 16:58
URI:	http://eprints.adm.unipi.it/id/eprint/2072

Repository staff only actions

View Item