UnipiEprints
Università di Pisa
Sistema bibliotecario di ateneo

Suffix tree characterization of maximal motifs in biological sequences

Federico, Maria and Pisanti, Nadia (2009) Suffix tree characterization of maximal motifs in biological sequences. Theoretical Computer Science, 410 (43). pp. 4391-4410. ISSN 0304-3975

[img]
Preview
PDF
Download (234Kb) | Preview

    Abstract

    SUMMARY: Finding motifs in biological sequences is one of the most intriguing problems for string algorithm designers due to, on the one hand, the numerous applications of this problem in molecular biology and, on the other hand, the challenging aspects of the computational problem. Indeed, when dealing with biological sequences it is necessary to work with approximations (that is, to identify fragments that are not necessarily identical, but just similar, according to a given similarity notion), and this complicates the problem. Existing algorithms run in time linear with respect to the input size. Nevertheless, the output size can be very large due to the approximation (namely exponential in the approximation degree). This often makes the output unreadable, as well as slowing down the inference itself. A high degree of redundancy has been detected in the set of motifs that satisfy traditional requirements, even for exact motifs. Moreover, it has been observed many times that only a subset of these motifs, namely the maximal motifs, could be enough to provide the information of all of them. In this paper, we aim at removing such redundancy. We extend some notions of maximality already defined for exact motifs to the case of approximate motifs with Hamming distance, and we give a characterization of maximal motifs on the suffix tree. Given that this data structure is used by a whole class of motif extraction tools, we show how these tools can be modified to include the maximality requirement without changing the asymptotical complexity.

    Item Type: Article
    Uncontrolled Keywords: Motif discovery, maximal motifs, suffix tree
    Subjects: Area01 - Scienze matematiche e informatiche > INF/01 - Informatica
    Divisions: Dipartimenti (until 2012) > DIPARTIMENTO DI INFORMATICA
    Depositing User: Dr. Nadia Pisanti
    Date Deposited: 01 Feb 2010
    Last Modified: 20 Dec 2010 11:49
    URI: http://eprints.adm.unipi.it/id/eprint/645

    Repository staff only actions

    View Item