UnipiEprints
Università di Pisa
Sistema bibliotecario di ateneo

Masking Patterns in Sequences: a New Class of Motif Discovery with Don't Cares

Battaglia, Giovanni and Cangelosi, Davide and Grossi, Roberto and Pisanti, Nadia (2008) Masking Patterns in Sequences: a New Class of Motif Discovery with Don't Cares. Technical Report del Dipartimento di Informatica . Università di Pisa, Pisa, IT.

[img] Other (GZip)
Available under License Creative Commons Attribution No Derivatives.

Download (254Kb)

    Abstract

    In this paper, we introduce a new notion of motifs, called \emph{masks}, that succinctly represent the repeated patterns for an input sequence $T$ of $n$ symbols drawn from an alphabet $\Sigma$. We show how to build the set of all maximal masks of length~$L$ and quorum~$q$, in $O(2^L n)$ time and space in the worst case. We analytically show that our algorithms perform better than constant-time enumerating and checking all the potential $(|\Sigma|+1)^L$ candidate patterns in~$T$ after a polynomial-time preprocessing of $T$. Our algorithms are also cache-friendly, attaining $O(2^L\, \mathit{sort}(n))$ block transfers, where $\mathit{sort}(n)$ is the cache oblivious complexity of sorting $n$ items.

    Item Type: Book
    Uncontrolled Keywords: Motifs, Masks
    Subjects: Area01 - Scienze matematiche e informatiche > INF/01 - Informatica
    Divisions: Dipartimenti (until 2012) > DIPARTIMENTO DI INFORMATICA
    Depositing User: dott.ssa Sandra Faita
    Date Deposited: 04 Dec 2014 14:19
    Last Modified: 04 Dec 2014 14:19
    URI: http://eprints.adm.unipi.it/id/eprint/2211

    Repository staff only actions

    View Item