Output describes the text, images, video, or other materials you receive from a GAI tool.
Researchers should evaluate for themselves the accuracy, appropriateness, and legality of any given use of output from a GAI tool. Below are outlined some common concerns for researchers considering the use of GAI output in their work.
Copyright law in generative AI contexts is far from settled. As you do with the other data, information, and source materials you use in your research, always verify the accuracy of information in and appropriate use of the output you receive from GAI tools.
Algorithmic bias describes the ways that computer programs (especially AI, but not exclusively!) can exhibit consistent errors—producing an incorrectly higher number more often than a lower number, for example.
This term is now primarily applied to the instances when these consistent errors reinforce social biases, which can arise when training data reflects existing bias or the distribution of data represents some demographics more than others. When programs with algorithmic bias are used to make real-world decisions or evaluations, this may lead (for example) to increased disparities in hiring, health outcomes, and more.
Researchers interested in using GAI tools are encouraged to look for or perform algorithmic bias audits of their tools of interest. This could be as simple as running tests using sample data. Researchers looking for ways to formally or informally audit a tool for a given purpose are welcome to book a consultation.
You may have heard of a phenomenon in GAI called "hallucination" or "confabulation."* This refers to the way that GAI tools can produce very plausible but inaccurate responses. (For example, it may claim that a book with a title very relevant to your prompt exists when it doesn't, or invent a historical person to support an argument. Or, as a recent paper in Computation and Language discovered, consistently over-generalize in summarizing scientific research.)
Additionally, GAI tools provide inconsistent responses (they're "stochastic")—so giving the same input to the same model twice can produce substantially different outputs. Together, these aspects of GAI, which are part of the nature of these tools, make accuracy and reproducibility significant questions for researchers interested in using them in their research. (For more information on challenges to reproducibility in AI research, including GAI, refer to this review.)
While this may be most pressing for scientists and social scientists, humanists making use of GAI tools may also wish to consider the ways inconsistent or inaccurate responses may impact their work, and might find the vocabulary and tools of reproducibility helpful in doing so.
A number of strategies can mitigate this, including:
Researchers may also find the FAIRER Aware Reproducibility Checklist helpful in thinking about what aspects of code and data use may enhance or reduce your work's reproducibility; it is especially helpful for scientists and social scientists.
No matter how you use GAI tools, making sure that you accurately and fully explain and document your use (including specific model and exact input and output) in a citation, appendix, or methods section supports the reproducibility of your research. Limor Peer also recommends an article that includes a Please refer to the section on this page regarding reporting and citing for more information on how to do this. The section "Recommendations for AI Methods" in this article on reproducibility in AI may be of particular value to researchers who are engaging with GAI at a deeper level.
* Note: some researchers prefer "confabulation" as using "hallucination" to describe this phenomenon can further stigmatize the mental health symptom by the same name.
Limor Peer, Associate Director for Research and Strategic Initiatives at Yale's ISPS, contributed to this section, including recommending both articles by Odd Erik Gundersen and the FAIRER Aware Reproducibility Checklist.
By communicating their use of GAI tools, researchers can enable replication of their work and ensure their own compliance with relevant institutions in their fields. This guide focuses primarily on citing GAI use in research output, including in the execution of research (such as data collection or analysis) and in the communication of it (such as figure generation or writing assistance). Expectations for GAI use in other realms may differ.
Student researchers should first refer to the Poorvu Center's advice for students regarding generative AI and be sure to observe relevant policies in their classes, workplaces, and job or fellowship applications.
Student, post-doctoral, faculty, and staff researchers engaged in grant-funded work, planning to publish in a peer-reviewed journal, performing a peer-review, or otherwise interacting with the institutions of their field(s) should inform themselves of the relevant institutions' expectations as well as referring to the provost's guidance for Yale community members. Funding bodies, journals, and publishers may have their own rules about how GAI tools can be used and how their use must be communicated.
For example, the ICMJE sets out the Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals, used by more than eight thousand journals in the medical field. Such policies or guidelines, this one included, often outline specific requirements for how researchers may use GAI tools in research, how and whether they can use them as referees or editors, and how to cite these tools and document their use.
If they don't, most major style guides offer advice:
As just a few examples of the guidelines journals and other bodies may set on the use and citation of GAI tools, see:
For every researcher, even if your particular grant, journal, or other intended venue does not specify guidelines, it can be good practice to provide the information readers may have come to expect based on disciplinary norms. Follow the guidance of other grants, journals, or professional bodies in your field, or consider existing best practices for other use of research code or software. In the absence of other guidance or advice, you may wish to include the following information in a citation, caption, appendix, methods section, or similar:
You may identify other information as relevant to include based on other comparable expectations in your field and venue, such as funding transparency.
A librarian in your discipline may be able to help you track down relevant field-specific organizations and guidance.
This section was co-authored with Kate Nyhan, Research and Education Librarian at the Medical Library. Thanks also to Alfred Guy, Director of Undergraduate Writing and Tutoring and Deputy Director of the Poorvu Center.
If you have any questions about methods, resources, projects, or more, feel free to book a consultation.