UnipiEprints
Università di Pisa
Sistema bibliotecario di ateneo

Lossless Filter for Long Multiple Repetitions with Edit Distance

Peterlongo, Pierre and Pisanti, Nadia and Pereira do Lago, Alair and Sagot, Marie-France (2006) Lossless Filter for Long Multiple Repetitions with Edit Distance. Technical Report del Dipartimento di Informatica . Università di Pisa, Pisa, IT.

[img] Postscript (GZip) - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (238Kb)

    Abstract

    Identifying local similarity between two or more sequences, or identifying repetitions occurring at least twice in a sequence, is an essential part in the analysis of biological sequences and of their phylogenetic relationship. Finding fragments that are conserved among several given sequences, or inside a unique sequence, while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computationally expensive task, and consequently exact methods can usually not be applied in practice. The filter we introduce in this paper, called Ed'Nimbus, provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment method, eliminating an important fraction of the input that is guaranteed not to contain any approximate repetition. It consists in the verification of a strong necessary condition. This condition concerns the number and order of exactly repeated words shared by the approximate repetitions. The efficiency of the filter is due to this condition, that we show how to check in a fast way. The speed of the method is achieved thanks also to the use of a simple and efficient data structure, that we describe in this paper, as well as its linear time and space construction. Our results show that using Ed'Nimbus allows us to sensibly reduce the time spent in a local multiple alignment.The Ed'Nimbus software is freely available on the web:http://igm.univ-mlv.fr/$\sim$peterlon/officiel/ednimbus/

    Item Type: Book
    Uncontrolled Keywords: repeats, local multiple alignment
    Subjects: Area01 - Scienze matematiche e informatiche > INF/01 - Informatica
    Divisions: Dipartimenti (until 2012) > DIPARTIMENTO DI INFORMATICA
    Depositing User: dott.ssa Sandra Faita
    Date Deposited: 09 Dec 2014 13:05
    Last Modified: 09 Dec 2014 13:05
    URI: http://eprints.adm.unipi.it/id/eprint/2164

    Repository staff only actions

    View Item