Skip to Main Content

Yale University Library Today's Hours:   Off-campus access  Search

Bioinformatics Tools: Gene Prediction/ Annotation

This guide contains a curated set of resources and tools that will help you with your research data analysis. It also includes those medical library workshops available at Yale University on many of these bioinformatics tools.

Visualization / Genome Browsers

Genome browsers integrate genomic sequence and annotation data from different sources and provide an interface for users to browse, search, retrieve and analyze these data. These are the main genome browsers:

University of California Santa Cruz genome browser

Ensemble genome browser

NCBI's Genome Browser

NCBI's Genome Workbench

The Vertebrate Genome Annotation (VEGA) is a  repository for high-quality gene models produced by the manual annotation of vertebrate genomes.

Genome Databases

The NCBI's Genome database organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.

Genomes Online Database (GOLD), is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world

Ab initio and Gene Prediction Tools


GENEID a program to predict genes, exons, splice sites and other signals along a DNA sequence. 

 JIGSAW a program that predicts gene models using the output from other annotation software. It uses a statistical algorithm to identify patterns of evidence corresponding to gene models.

AUGUSTUS is an open source program that predicts genes in eukaryotic genomic sequences.It has a protein profile extension (PPX) which allows to use protein family specific conservation in order to identify members and their exon-intron structure of a protein family given by a block profile.By incorporating mRNA alignments, EST alignments, conservation and other sources of informationcan predict alternative splicing and alternative transcripts, the 5'UTR and 3'UTR including introns.

EuGene is an open integrative gene finder for eukaryotic and prokaryotic genomes- it is characterized by its ability to simply integrate arbitrary sources of information in its prediction process, including RNA-Seq, protein similarities, homologies and various statistical sources of information.

PseudoPipe is a stand alone computational pipeline for pseudogene annotation.

Peak Calling

Genome wide Event finding and Motif discovery (GEM) links binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence, resolves ChIP data into explanatory motifs and binding events at unsurpassed spatial resolution. GEM reciprocally improves motif discovery using binding event locations, and binding event predictions using discovered motifs.

SPP is a R package especially designed for the analysis of Chip-Seq data from Illumina.