DART Database for Active Regions with Tools

OVERVIEW

The DART database is a relational database
implemented in MySQL on a Linux server. There
are tables for recording basic active region
information such as chromosome, location, strand,
sequence, and genome build number. Other tables
and relations are used to define higher level objects
such as sets of active regions and classes of sets.

Software and Web pages access the DART database
through library routines written in Perl. These library
routines support functions such as defining a genome
build number, reannotating active regions for a new
genome build, inserting active regions, defining sets
and their attributes, and defining classes of sets.
As object are entered into DART the library routines
assign a unique accession number for each object created
or inserted.

The Web pages used to naviagate and view DART
objects are implemented mainly as Perl cgi scripts.
The user is offered a variety of options for viewing
a single activate region, sets and subsets of active
regions, intersecting active regions, etc. Public
domain Perl libraries are used to construct and
display graphs on certain DART web pages. URLs
are constructed for sending DART data to public
browers such as the UCSC genome browser.

The Active Region Comparer (ARC) provides a web-based
interface for comparing, integrating, filtering, and
annotating sets of genomic intervals. ARC analyzes
uploaded or imported datasets by calculating summary
statistics, such as mean and median fragment length.
It also performs combinatorial operations to generate
new datasets containing the genomic regions common to k
out of n files (k between 1 and n). All ARC functions
are implemented in Perl and the interface is generated
using Perl CGI.

ARC annotates genomic intervals by retrieving annotations
from locally installed Ensembl Core databases. For each
region, the annotations page can perform operations that
identify overlapping exons, sequence data, neighboring
transcripts, and other genomic features. By filtering on
properties such as nucleotide length and sequence GC content,
ARC also manipulates the contents of each dataset. The
annotations data are imported via the Ensembl Perl API, and
the results are generated and saved using Perl CGI.

ARC also promotes the visualization of user-annotated tracks
by exporting data to the UCSC Genome Browser. The tool
allows the user to view multiple exported files simultaneously
as custom tracks, and it uses a nearest neighbor clustering
algorithm to create hyperlinks that facilitate observation of
closely spaced regions in the UCSC browser. Once uploaded as a
custom track, the datasets can be analyzed by the tools
available at UCSC and can be exported by UCSC to other web tools,
such as Galaxy. These features allow for efficient analysis of
genomic regions, simplifying processes such as comparing replicate
datasets or categorizing genomic elements based upon tissue
or cell type.

Yale Bioinformatics   Report problems