Battaglia, Giovanni and Grossi, Roberto and Cangelosi, Davide and Pisanti, Nadia (2009) Masking Patterns in Sequences: A New Class of Motif Discovery with Don't Cares. Theoretical Computer Science, 410 (43). pp. 4327-4340. ISSN 0304-3975
Abstract
SUMMARY We introduce a new notion of motifs, called masks, that succinctly represents the repeated patterns for an input sequence T of n symbols drawn from an alphabet. We show how to build the set of all frequent maximal masks of length L in O.2Ln/ time and space in the worst case, using the KarpMillerRosenberg approach. We analytically show that our algorithm performs better than the method based on constant-time enumerating and checking all the potential .jj C 1/L candidate patterns in T , after a polynomial-time preprocessing of T . Our algorithm is also cache-friendly, attaining O.2L sort.n// block transfers, where sort.n/ is the cache complexity of sorting n items.
Repository staff only actions