Battaglia, Giovanni and Cangelosi, Davide and Grossi, Roberto and Pisanti, Nadia (2008) Masking Patterns in Sequences: a New Class of Motif Discovery with Don't Cares. Technical Report del Dipartimento di Informatica . Università di Pisa, Pisa, IT.
Other (GZip) Available under License Creative Commons Attribution No Derivatives. Download (254Kb) |
Abstract
In this paper, we introduce a new notion of motifs, called \emph{masks}, that succinctly represent the repeated patterns for an input sequence $T$ of $n$ symbols drawn from an alphabet $\Sigma$. We show how to build the set of all maximal masks of length~$L$ and quorum~$q$, in $O(2^L n)$ time and space in the worst case. We analytically show that our algorithms perform better than constant-time enumerating and checking all the potential $(|\Sigma|+1)^L$ candidate patterns in~$T$ after a polynomial-time preprocessing of $T$. Our algorithms are also cache-friendly, attaining $O(2^L\, \mathit{sort}(n))$ block transfers, where $\mathit{sort}(n)$ is the cache oblivious complexity of sorting $n$ items.
Item Type: | Book |
---|---|
Uncontrolled Keywords: | Motifs, Masks |
Subjects: | Area01 - Scienze matematiche e informatiche > INF/01 - Informatica |
Divisions: | Dipartimenti (until 2012) > DIPARTIMENTO DI INFORMATICA |
Depositing User: | dott.ssa Sandra Faita |
Date Deposited: | 04 Dec 2014 14:19 |
Last Modified: | 04 Dec 2014 14:19 |
URI: | http://eprints.adm.unipi.it/id/eprint/2211 |
Repository staff only actions
View Item |