Università di Pisa
Sistema bibliotecario di ateneo

A compression-based approach for coding sequences identification in prokaryotic genomes

Menconi, Giulia and Marangoni, Roberto (2005) A compression-based approach for coding sequences identification in prokaryotic genomes. Technical Report del Dipartimento di Informatica . Università di Pisa, Pisa, IT.

[img] PDF (GZip) - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (163Kb)


    To identify coding regions in genomic sequences represents the first step toward further analysis of the biological function carried on by the different functional elements in a genome. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitable defined compression index of a DNA sequence. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, discovering that this approach can fail in the presence of highly-structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences, etc.).

    Item Type: Book
    Uncontrolled Keywords: Compression, Coding regions, Genes, Prokaryote.
    Subjects: Area01 - Scienze matematiche e informatiche > INF/01 - Informatica
    Divisions: Dipartimenti (until 2012) > DIPARTIMENTO DI INFORMATICA
    Depositing User: dott.ssa Sandra Faita
    Date Deposited: 09 Dec 2014 11:19
    Last Modified: 09 Dec 2014 11:19
    URI: http://eprints.adm.unipi.it/id/eprint/2136

    Repository staff only actions

    View Item