RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads.

Publication Dbxref

PMID:23376349

Structured Abstract Part

MOTIVATION
Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers.
RESULTS
Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.

Title

RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads.

Publication Type

Journal Article

Additional Publication Type(s)

Research Support, Non-U.S. Gov't

Series Name

Bioinformatics (Oxford, England)

Volume

29

Publication Year

2013

Issue

6

Page Numbers

792-3

DOI

10.1093/bioinformatics/btt054

Journal Abbreviation

Bioinformatics

EISSN

1367-4811

Publication Date

2013 Mar 15

Unique Local Identifier

Novák P, Neumann P, Pech J, Steinhaisl J, Macas J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads.. Bioinformatics (Oxford, England). 2013 Mar 15; 29(6):792-3.

Citation

Novák P, Neumann P, Pech J, Steinhaisl J, Macas J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads.. Bioinformatics (Oxford, England). 2013 Mar 15; 29(6):792-3.

ISSN

1367-4811

Language Abbr

eng

Publication Model

Print-Electronic

Authors

Novák P, Neumann P, Pech J, Steinhaisl J, Macas J

Language

English

Elocation

10.1093/bioinformatics/btt054

URL

https://doi.org/10.1093/bioinformatics/btt054

Journal Country

England

Abstract

MOTIVATION
Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers.

RESULTS
Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.

Database Reference Annotations

PMID:23376349

Analysis

Repeat Explorer 2013 Analysis

Is Obsolete

False