UnipiEprints
Università di Pisa
Sistema bibliotecario di ateneo

Self-adjusting Data Structures for External Memory String Access

Ciriani, Valentina and Ferragina, Paolo and Luccio, Fabrizio and Muthukrishnan, S. (2001) Self-adjusting Data Structures for External Memory String Access. Technical Report del Dipartimento di Informatica . Università di Pisa, Pisa, IT.

[img] Postscript (GZip) - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (179Kb)

    Abstract

    Data warehouses are increasingly storing and managing large scale string data, and dealing with large volume of transactions that update and search string databases. Motivated by this context, we initiate the study of {\em self-adjusting} data structures for string dictionary operations, that is, data structures that are designed to be efficient on an entire sequence of operations rather than individual string operations. Furthermore, we study this problem in the external memory model where string data is too massive to be stored in main memory and has to reside in disks; each access to a disk page fetches $B$ items, and the cost of the operations is the number of pages accessed. We show that given $n$ strings $S_{1},\ldots,S_{n}$ of total length $\sum_i |S_i|=N$, a sequence of $m$ string searches $S_{i_{1}},S_{i_{2}},\ldots,S_{i_{m}}$ takes $ O(\sum_{j=1}^{m}(\frac{|S_{i_{j}}|}{B})+\sum _{i=1}^{n}(n_{i}\log_{B}\frac{m}{n_{i}}))$ amortized expected I/Os, where $n_{i}$ is the number of times $S_{i}$ is queried. Inserting or deleting a string $S$ takes $O(\frac{|S|}{B}+\log_{B}n)$ amortized expected I/Os. This result is the analog of what is known as the Static Optimality Theorem~\cite{Sleator-Tarjan} proved by Sleator and Tarjan in their classic splay trees paper for a dictionary of numerical values; here, it has been generalized, for the first time to string data, to string operations, and to the external memory model. This performance is achieved not by traditional ``splay'' operations on search trees as in~\cite{Sleator-Tarjan}, but by designing a novel self-adjusting data structure based on the well-known skip lists. In addition, we introduce the paradigm of using the main memory (or a part thereof) persistently across operations, in the manner of a cache, to further improve the performance of our self-adjusting skip list. This is quite reasonable in applications where string operations are large in volume.

    Item Type: Book
    Uncontrolled Keywords: Self-adjusting Data Structures, String, Skip List, External Memory
    Subjects: Area01 - Scienze matematiche e informatiche > INF/01 - Informatica
    Divisions: Dipartimenti (until 2012) > DIPARTIMENTO DI INFORMATICA
    Depositing User: dott.ssa Sandra Faita
    Date Deposited: 17 Dec 2014 17:20
    Last Modified: 17 Dec 2014 17:20
    URI: http://eprints.adm.unipi.it/id/eprint/2070

    Repository staff only actions

    View Item