Skip to main content

Bioinformatics Tools for Research and Discovery at Yale University: Text Mining

This guide contains a curated set of resources and tools that will help you with your research data analysis. It also includes those medical library workshops available at Yale University on many of these bioinformatics tools.

WORKSHOP: Tools for Mining the Biomed Literature

Workshop: Novel online tools for mining the biomedical literature: The rapid growth of experimental and computational biomedical data is being accompanied by an increase in the number of biomedical publications discussing these results. This makes retrieving relevant scientific information and identifying connections between findings, a challenging task. New literature-mining tools (e.g. KNALIJ, Quertle, NextBio, iHOP, SemMed, GoPubMed, etc) may be of help when sorting through this abundance of literature, as discovery and hypothesis generating tools. This workshop provides an introduction on how to use some of these literature-mining tools when answering research questions.


Visualization and Statistics Tools for Mining the Biomedical Literature

PubNet (Publication Network Graph Utility) is a web-based tool that extracts several types of relationships returned by PubMed queries and maps them on to networks, allowing for graphical visualization, textual navigation, and topological analysis.   

  Semantic Medline is a web application that uses natural language processing to exract semantic predications from a PubMed search. Resutls are presented as an interrelated network of concepts.

   Quertle retrieves information within the biomedical literature by using its own semantic database of 300 million relationships.

 NextBio (Literature) Uses a tag cloud approach to help discovering the most important concepts resulting from a query.

 Coremine presents search results as a graphic network that describes relationships discovered through text-mining. Relationship networks provide an overview of a topic by clustering important terms. The network is also a navigational tool that can help searchers explore concepts related to their search term.

 Chilibot searches PubMed literature database (abstracts) about specific relationships between proteins, genes, or keywords presenting the results as networked relationships.

NCBI Resources

PubMed is the NCBI interface for biomedical literature from  MEDLINE, life science journals and online books.

Medical Subject Headings (MeSH) is the NLM controlled vocabulary thesaurus used for indexing articles for PubMed

Nucleotide Database is a collection of DNA and RNA sequences from several sources, including GenBank, RefSeq, TPA and PDB.

Genome Sequencing Projects "organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations." Organized by organism.

Protein Sequence Database is "a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPPA, as well as records from SwissProt, PIR, PRF, and PDB.

Purchase Request

To submit a collections purchase request to the library click here: Purchase Request. A library selector will be in touch with you.


Rolando Garcia-Milian
Follow me on ResearchGate
Yale Medical Library, 333 Cedar St, Room L-111, New Haven CT 06520
Website / Blog Page

Science Research Support Librarian - Life Sciences

Lori Bronars
219 Prospect St., PO Box 208111 New Haven, CT 06520-8111

Center for Science and Social Science Information C38 Kline Biology Tower
Yale University

non-CSSSI office hours:
Tuesdays 9-10 am 1st floor break room OML
Fridays 1:30-2:30 pm BASS 205
203 432 6213