Skip to Main Content

Yale University Library Today's Hours:   Off-campus access  Search

Bioinformatics Tools: Variation

This guide contains a curated set of resources and tools that will help you with your research data analysis. It also includes those medical library workshops available at Yale University on many of these bioinformatics tools.


The 1000 Genomes Project goal is to find most genetic variants that have frequencies of at least 1% in the populations studied.

Variant Analysis Tools

GWAS Central (previously the Human Genome Variation database of Genotype-to-Phenotype information) is a database of summary level findings from genetic association studies, both large and small.

Function Based Prioritization of Sequence Variants (FunSeq) can be used to automatically score and annotate disease-causing potential of SNVs, particularly the non-coding ones. It can be used on cancer and personal genomes. It also contains a downloadable tool.

Variant Annotation and Discovery Tools

Variant Annotation Tool (VAT) is a computational framework to functionally annotate variants in personal genomes using a cloud-computing environment.

The Genome Analysis Toolkit (GATK) is a software package developed at the Broad Institute to analyse next-generation resequencing data.

CNVNATOR is tool for CNV discovery and genotyping from depth of read mapping.

BREAKSEQ is pipeline for annotation, classification and analysis of SVs at single nucleotide resolution.

PEMer is computational and simulation framework for discovering SVs by paired-end read mapping.

Tools for Predicting the Effects of Single Amino Acid Substitution

The Variant Effect Predictor (VEP) determines the effect of variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. 

Sorting Intolerant From Tolerant (SIFT) is an online tool is a sequence homology-based tool that sorts intolerant from tolerant amino acid substitutions and predicts whether an amino acid substitution in a protein will have a phenotypic effect.

Polymorphism Phenotyping v2 (PolyPhen-2) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. It uses sequence conservation, structure and SWISS-PROT annotation.

SNPs3D is a website which assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis. It contains three modules SNP Analysis; Gene-Gene Network; and Disease Candidate Gene.

Align-GVGD is a freely available, web-based program that combines the biophysical characteristics of amino acids and protein multiple sequence alignments to predict where missense substitutions in genes of interest fall in a spectrum from enriched delterious to enriched neutral.

Other SNPs Analysis

SNPnexus was designed to simplify and assist in the selection of functionally relevant Single Nucleotide Polymorphisms (SNP) for large-scale genotyping studies of multifactorial disorders.

HaploReg is a tool for exploring annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci

Variation Databases


dbSNP is the NCBI database of single nucleotide polymorphisms (SNPs) and also includes information on insertions/deletions, microsatellites, and non-polymorphic variants.

The database of Genotypes and Phenotypes (dbGaP) archive data that resulted from the interaction of genotype and phenotype; e.g. genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.

ClinVar provides is an archive of reports of relationships among medically important variants and phenotypes.

 ALFRED  is a database that provides allele frequencies and DNA polymorphisms.

Allele Frequency Net Immunogenetic gene frequencies in worldwide populations.

Findbase is an online resource documenting frequencies of pathogenic genetic variations leading to inherited disorders in various populations worldwide. Database records include the population, the ethnic group and/or the geographic region, the gene name and its variation parameters, the rare allele frequencies, accompanied by links to the respective Online Mendelian Inheritance in Man (OMIM). It includes the following modules:

Phenotype databases

The Online Mendelian Inheritance in Man (OMIM) is a freely available compendium of human genes and genetic phenotypes.

The Genetic Testing Registry (GTR) provides a central location for voluntary submission of genetic test information by providers. The scope includes the test's purpose, methodology, validity, evidence of the test's usefulness, and laboratory contacts and credentials