Skip to Main Content

Generative AI for Research: What to Consider: Tools

Overview

There are a variety of ways that researchers at Yale use generative AI tools. This may include training their own models, fine-tuning existing models, or using out-of-the-box tools through chat interfaces or APIs.

This information mostly aims to address the use of existing models and tools. Where possible, we recommend that researchers make use of the tools Yale provides access to.

An Overview of GAI Tools

GAI tools are, at their core, machines that are very, very good at producing something that might plausibly exist in the universe of things they've "seen" (training data). How exactly this is achieved can vary, but many of the strengths and pitfalls are shared across generative AI tools and models.

This means that, for example, Large Language Models (LLMs) can produce chunks of text that convincingly mimic real sentiments, ideas, or even knowledge and image generators can create pictures you might reasonably believe were photographs.

It also means that such models may underperform on tasks for which a probable answer linguistically based on their training data is not a correct or useful answer. For example, you may have seen GAI tools struggling to spell "strawberry" or to understand very basic physics. You may also have noticed GAI tools providing incorrect information in a confident tone. For more on these and their impact on research, see the ouput tab of this guide.

These tools rely on the use of very large sets of data (although recent inquiry suggests that smaller datasets might suffice), a problem many companies have solved by using trawls of the web that may burden web servers and incorporate incorrect, copyrighted, or even AI generated content.

Evaluating a Tool

While GAI tools rely on similar technologies and, in many cases, similar training data, it is nonetheless worth looking into the specifics of a tool you're interested in using before you embark on your research.

For example, as described in the input tab of this guide, the provost has issued guidance that medium- and high-risk data cannot be used in external tools due, while some Yale-managed tools may be appropriate for use with this data.

Some models, including those provided in Yale tools such as Clarity, may have been trained or fine-tuned using labor practices you might wish to avoid in your suite of research tools.

In addition, you may wish to consider the probable training data of a tool in identifying what biases or inaccuracies it might be more or less prone to.

While there is no one route to knowing everything there is to know about a given tool, the following may offer starting points for your research:

  • A tool's "about" page or other official website content
  • The published research of employees and former employees who worked on a tool
  • Online speculation by experts about a given tool
    • As with anything else, always consider source: academics writing for peer-reviewed publications and journalists writing for respected papers likely have to do more to back their speculation up with evidence than contributors to Stack Overflow or individual blogs, though the latter may still be valuable resources

This kind of research is a key skill in librarianship; a librarian in your discipline may be able to help you identify relevant sources on tools of interest to you.

Retrieval Augmented Generation (RAG)

Retrieval augment generation (RAG) describes a strategy for building GAI tools that joins the tool up with a predetermined pool of sources (for example, scholarly articles) that are queried, used to enrich the original prompt with more information, and returned alongside the tool's output.

This can sometimes produce more useful results for researchers than GAI tools not augmented in this way. For example, a RAG-based tool might be able to provide citations for a researcher to follow up on to verify the output it provides or to suggest additional sources for a literature review.

However, these tools can still hallucinate or confabulate even about the content of the sources it returns, overgeneralize results, or select sources in a biased manner.

The library may be able to support projects for which researchers might otherwise wish to use a RAG tool in other ways—for example, the medical library offers an evidence synthesis and literature review service, and all researcher-facing librarians can recommend databases and strategies for identifying relevant research.

Contact Us

If you have any questions about methods, resources, projects, or more, feel free to book a consultation.