Table of Contents
November: Digitizaing World Christianity
November: EliScholar Debut
September: Digital Humanities
August: Interview: First Year
July: Rolling out Summon
February: Hydra project management
September: What's Happening?
Flowchart tool helps to get project started
The Kissinger Papers project is gearing up for digitization of the materials to start on June 1, but Rebecca Hirsch, the Kissinger Project Archivist, has been working on preparations for the digitization portion of the Kissinger project since June 2013. These preparations have included using LucidChart software to create flow charts of the workflows involved. The illustration [above/below/to the right/left] is the flow chart for the QC of the hard drives of image files we will receive from the digitization vendor.
These flow charts are being used as the basis for a workflow management system, which consists of InfoPath forms managed by SharePoint workflows. This tool will track all parts of the digitization project, from shipping the materials to the vendor to the successful completion of QC and the return of the materials from the vendor. The workflow tool team includes Rebecca Hirsch, Kevin Glick, Robert Klingenberger and consultant John Coffey.
In addition to the creation of the workflow management tool, we have been in dialogue with a number of other institutions to ensure best practices and standards are followed during the project. Based on the information gathered, the Kissinger team had extensive discussions with other YUL departments, including Preservation, IT and Digital Initiatives, about what file formats should be created. The project finally settled on a cropped, full color TIFF, a text file with OCR for full-text search, and greyscale PDFs for downloading and printing. MSSA hopes that the work done for the Kissinger project will help guide future YUL digitization projects.
--submitted by Rebecca Hirsch, April 2014
Digitization Workflow Control Tool
In March 2012, in preparation for the digitization and reprinting of the majority of the Kissinger Collection, the Department of Manuscript and Archives (MSSA) began working with the Preservation Department to examine the workflow implications. There is a huge volume of materials to be digitized, some eight hundred fifty thousand (850,000) pages which will result in 1.7 million digital images each of which will have four versions. Existing digitization workflow principals will apply, but the tools needed to manage a workflow of this size and complexity did not exist.
A tool previously envisioned by the Preservation Department to manage the workflow for its own reformatting operations served as the basis for the solution as it includes shipping of original materials and quality control of image files. Starting in June 2013, MSSA, Preservation and Library IT along with John Coffey, an IT and business practices consultant began examining technology options including Activiti, Taverna, the Digital Asset Factory from Bibliotheca Alexandrina, and solutions involving tools already slated to be supported by Yale ITS and/or Library IT.
The chosen tool is based on a cloud-based SharePoint 2013, InfoPath forms, SQL 2012 database, and SQL Server Reporting Tools. Added to this mix is a set of tools created by Qdabra which specializes in web services and automated workflow tools based on SharePoint, InfoPath and their own Database Accelerator (DBXL) web services.
The required Kissinger Project functionality is currently in development and Beta Testing.
--submitted by Robert Klingenberger, April 2014
Digitization Projects at the Divinity Library
The Divinity Library is engaged in four grant-funded projects to digitize portions of its Day Missions Collection. The Day Missions Collection, founded by Yale professor George Edward Day in 1891, consists of books, periodicals, pamphlets, reports, photographs, and archival collections that document the missionary movement and world Christianity. Largest of the current initiatives is an NEH-funded effort to digitize 350,000 pages of annual reports and periodicals from the Day Collection. This project, which continues work begun in 2010 with Arcadia funds, is among the collections currently being ingested via Ladybird into the new Yale University Library Digital Collections repository.
NEH funding is also supporting the digitization and description of 3,000 photographs to be added to existing Divinity Library content in the International Mission Photography Archive (IMPA: http://www.usc.edu/impa), a project hosted by the University of Southern California. Previous grants from Getty and Mellon for the IMPA project have enabled the Divinity Library to digitize and describe more than 10,000 photographs from personal papers collections, organizational archives, and its missionary postcard collection.
A substantial number of photographs in the Divinity IMPA content come from the archives of the United Board for Christian Higher Education in Asia (UBCHEA), which served as a support agency for colleges and universities founded by Protestant mission agencies in China between 1880 and 1950. These visual images will now be complemented by digital access to selections from the archives of the UBCHEA. The UBCHEA had previously funded the microfilming of its archives and now has provided funding to digitize from the microfilm. A total of ninety reels of microfilm have been digitized to date.
The fourth digitization project currently in progress is funded by the Center for Christian Studies of Shantou University, which is seeking to gather research resources related to mission and church work in the Shantou area of China. Two collections of papers of missionaries who worked in this area have been digitized, the Abbie Sanderson Papers and the Ellison and Lottie Hildreth Papers.
--submitted by Martha Smalley, November 2013
Yale University Library Debuts EliScholar
A Repository Showcasing Yale Scholarship
This summer saw the soft launch of EliScholar, Yale University Library’s institutional repository. The purpose of an institutional repository is to freely display and archive faculty and student research in a variety of formats. The creation of EliScholar is important in that it supports the University’s mission “to create, preserve and disseminate knowledge.” Highlighting faculty publications, student research, conference proceedings, monographs, and open access journals is just one component of EliScholar. Anyone in the world can freely access materials that are created by Yale departments, schools, centers or institutes and are hosted in EliScholar. The institutional repository makes content more visible and highly discoverable by search engines such as Google and Google Scholar which is beneficial for professors and graduate students. It also provides researchers with an easy way to keep up with Yale scholarship.
EliScholar already contains a variety of materials. One of the earliest additions was the Nepal Studies Newsletter, part of the Yale Himalaya Initiative. The Yale Medicine Thesis Digital Library Project which contains theses and dissertations from the Yale School of Medicine going back to 1952 is one of the largest projects to date. EliScholar includes several publications from The Yale University School of Nursing including alumni/ae newsletters, year books, bulletins and even a scanned copy of a commemorative napkin. Past issues of Nota Bene and YUL Annual Reports are also included and plans are underway to include the Yale Bulletin and other historical Yale publications. In addition to digitized print content, EliScholar can be used as a platform to host conference information and submissions. One example of this is the recent “Day of Data” that was held. Posters were submitted and posted to EliScholar where they are available to view. Some future plans are to include student paper prize winners, newsletters from the School of Management and the School of Forestry and an occasional paper series from the Council on East Asian Studies.
Another component of EliScholar is SelectedWorks™. SelectedWorks™ allows faculty and graduate students to easily create and maintain their own online academic presence. By providing template-like tools for individuals to use, scholars can easily build a webpage that is professional and capable of hosting both audio and visual materials. Like EliScholar, SelectedWorks™ boosts faculty and graduate students’ profiles in search engine rankings. A gallery of SelectedWorks™ pages appears on EliScholar’s homepage.
In the midst of issues surrounding self-publishing, open access and digital archiving, EliScholar is a stepping stone to providing an option to research and scholarly needs at Yale.
To see what EliScholar is and for additional information, go to http://elischolar.library.yale.edu.
--submitted by Kelly Barrick, November 2013
Digital Humanities & Digitized Archives in the Library
Yale University Library has a long history of engagement with digital collection, preservation, and cataloging. With the arrival of a Librarian for Digital Humanities Research, the Library is expanding its services to the Yale scholarly community by offering consultation, training, and support for digital humanities projects directly. One of the motivating factors behind this new set of services is the recognition that the amount of digitized material now available to humanities researchers has spurred interest in complementing traditional scholarly techniques with new forms of analysis.
As an example of how we might "read" an archive from afar, consider the magazine Vogue. Published continuously for over 120 years, the issues in this collection number over two thousand. Since Yale University Library has a Perpetual Access License to the Vogue Archive, any member of the University community can browse the magazine itself online. (http://search.proquest.com/vogue/) But since the individual pages number over 400,000, we were curious if any patterns were observable in the data algorithmically. Taking a sample of all covers published every ten years, and averaging their pixels mathematically, shows interesting trends in the way that Vogue presented its face to the world:
In this visualization, we can see the shift from monochromatic illustrated covers (1900s) to color illustration (1910s-1930s) and finally to color photography (1940s-2000s). But we also notice a ghostly pattern in the covers from 1970 and 1980 -- a repeated positioning of the cover model's face in the same position and angle over the course of a year. This contrasts with the data from for example 1930, where no such clear pattern is evident. The pixel averages show us, in dramatic visual form, a change that has been well-documented by historians of American fashion: the avant-garde covers of Vogue from the first half of the 20th Century, with unique illustrations by internationally-recognized artists, stand in contrast to the more formulaic approach of the 1970s and ’80s.
There are other ways of "reading" an archive with Digital Humanities techniques, including measuring the frequency of words in issues from different periods. Below, a comparison of issues from 1896, 1950 and 2013 shows the changing language of Vogue: in 1896, words such as "mrs", "miss" and "daughter" hint at the editors' interest in social lives of the "four hundred families" in the New York City area. In 2013, terms such as "beyoncé", "boutiques" and "manicure" are proportionally more common.
The archives of Vogue are, of course, just an example -- many humanities collections can be interrogated in these and other ways. If you have an idea for a similar project, or questions about these kinds of methods, reach out to the Librarian for Digital Humanities Research to start a conversation.
--submitted by Peter Leonard, September 2013
Interview: First Year
A Conversation with Susan Gibbons, University Librarian, (SG) answering questions from Elizabeth Beaudin, Director of Digital Initiatives (EB)
EB: It is one year since you sent email to the Yale University Library community announcing a digital strategy. The areas covered included policy and governance (Digital Initiatives Advisory Board), technical infrastructure (Digital Infrastructure Working Group), and metadata standards (Digital Assets Metadata Committee), the need for a digital preservation strategy, and legal guidance related to digitization decisions. That is a great deal to achieve. How are things going?
SG: Thus far, the work on developing our digital strategy has exceeded my expectations. A key factor to the success was the decision to use five current digitization projects as case studies. This decision transformed a theoretical exercise of designing an end-to-end digitization workflow into a very concrete, detailed study of five digitization projects, and all of the unexpected gaps, hiccups, false presumptions, as well as under-utilized talent and extraordinary expertise, rose to the surface. The policies and procedures designed the by Digital Initiatives Advisory Board have provided a framework through which we can better understand the work involved in a digitization project and make an informed decision about whether we can and should make an organizational commitment to a project. Our technical infrastructure is taking shape and while there is still a lot of work to do, we now have an understanding of the infrastructure gaps and are taking the initial steps in addressing them. The metadata standards are still under development, but the recent development of a metadata “scrum” session has demonstrated how we can make good decisions with short deadlines. The digital preservation strategy is moving forward on several fronts including the hire of a preservation librarian and the decision to use Archivematica. On the legal front, we now have a standing meeting with University Counsel every other month as a forum to discuss the complicated copyright and intellectual property issues involved in digitization. We’ve come a very long way in just a year.
EB: Several colleagues have asked why we should spend time on this strategy when budgets and the overall economic climate do not allow for major initiatives. How do you reconcile such strategic planning during challenging economic times?
SG: While we may not have the funds for major, new digitization initiatives, we do have significant, on-going digitization projects that require our best efforts. Kissinger Papers, Fortunoff Video Archive for Holocaust Testimonies, and the Jonathan Edwards Papers project are just a few examples. We must do right by those obligations and that requires a digitization strategy. Moreover, if we hope to be competitive in acquiring funding in the future from granting agencies for new digitization projects, we will need to be able to demonstrate that we have a sound digitization strategy in place. Finally, when funding is scarce, you want to make sure that you are fully optimizing the use of the funding that you have; therefore a concerted effort on strategy helps to ensure that what funding is available for digitization projects is used in the most cost effective manner.
EB: What is one unexpected benefit that has emerged in the last year from the activities surrounding this strategic plan?
SG: I think the efforts that have gone into the digital initiatives work of this past year has helped to form relationships across YUL that might not have formed otherwise. I would guess that everyone involved would agree that they have formed new professional relationships as a result of this initiative.
Arcadia Year 5 Projects Announced
The following projects have been selected for funding in Arcadia Grant Year 5 (September 1, 2013 – August 31, 2014).
Preserving and Conserving the North African Jewish Collection (Christine McCarthy / Nanette Stahl)
- Approximately 2,000 manuscripts (15,000 pages) from the Jewish community of North Africa were cataloged in Arcadia Year 4. During that time, the Conservation department identified six manuscripts that require major conservation intervention to prevent significant text loss from iron gall ink deterioration. These fragile materials will receive conservation at the Conservation Center for Art and Historic Artifacts (CCAHA) in Philadelphia. The remaining collection will be reviewed at CCAHA as part of a conservation condition survey.
Catalog African Language Materials (Joan Swanekamp)
- Since 2007 Lisbet Rausing and the foundation she formed (Arcadia) have been funding projects for the cataloging of Yale’s African collection. This project will focus on the remaining 1500-2000 titles which come from more than 60 languages, including approximately 400 titles in Afrikaans and 200 titles in Amharic. Most titles are unique copies or only one of a few known copies to exist.
Catalog Scandinavian and Dutch Language Materials (Joan Swanekamp)
- A one-year project focused on cataloging materials in Scandinavian and Dutch languages (Norwegian, Swedish, Danish, Icelandic, and Dutch) to improve subject analysis and classification. A large percentage of the materials from this project will be new and unique to OCLC WorldCat.
Preserving Unique Films in the Benny Goodman Collection (Remi Castonguay / Francesca Livermore)
- The Gilmore Music Library proposed a project focusing on the preservation and digitization of films and accompanying soundtracks in the Benny Goodman Collection. A recent preservation survey found that many of the films were in active degradation. The poor physical state of these films prevents their access by patrons and the absence of action will result in their extinction. This project will produce analog duplicates of these films.
Spatial Access to Spatial Resources: the Open Geoportal Hydra head (Susan Powell)
- Using 7,000 previously digitized maps, this project aims to a) create item-level metadata for the collection; 2) set up capacity to serve geodata; 3) build a robust, well-functioning spatial search portal Hydra head to improve user discovery and integrate with other Yale University Library digital infrastructure; and, 4) develop sustainable workflows for metadata creation and ingest, with principles for geocoding non-explicitly spatial collections.
Congratulations to all!
Rolling out Summon
An update from Library IT, Enterprise Systems:
YUL acquired Summon, a search and discovery tool, from Serials Solutions in May 2013. Library IT, Acquisitions/Electronic Resources, public services staff from Medical, CSSSI, SML, and Law School staff, are configuring Summon for an initial debut in August 2013. Staff may preview Summon and use the Feedback link to submit questions to the implementation group.
The first objective is configuring as many of our licensed electronic databases, journals and eBooks as possible, so their content is discoverable in a Summon search. Over time, Summon will replace federated searching in Metalib because of the improvements Summon offers, such as simultaneously searching more content through a central index, providing search-related data such as the number of citations from Web of Science, and recommended databases and LibGuides for additional information.
The most significant changes you will notice over the next few months will be the addition of more of our licensed holdings, integration of Summon into the Library website and LibGuides, and adjustments to the searchable A-Z lists for locating and linking directly to native databases and e-journal interfaces. A-Z lists will remain available, as Summon does not index every licensed e-resource at YUL and direct database searching may be necessary or preferable in some disciplines. It is important to note that records from Orbis and Morris will not be included in Summon searches. The long-term goal is to develop a Blacklight interface for centralized searching of major library resources including Orbis and Morris, Summon, the website, and digital repositories.
In addition to the configuration of the public search/discovery tool, many internal workflows and procedures are undergoing evaluation as we integrate a new application into the library. Public Services staff are currently preparing materials to introduce Summon to students and to create best practices. Much work remains, but we welcome your early testing and feedback.
-- submitted by Melissa Wisner, July 2013
Discovery at Yale via Summon
Discovery of YUL collections to change
Yale University Library is embarking on a complete overhaul of how its collections -- physical and digital, owned and licensed -- are searched and delivered to its patrons. The implementation of a new Discovery system will happen in phases, and staff can expect to see changes starting this summer.
Find Articles: Change from Metalib to Summon
Summon provides a single index of full text from many e-journal articles, ProQuest newspapers, and digitized books in the HathiTrust digital library. Summon will replace the former Metalib quickset searches and will be linked under Find Articles from the library’s home page. Compared with the federated search approach from Metalib, the Solr index created by Summon will be faster, more comprehensive, and will provide better relevance ranking.
Orbis and MORRIS records will NOT be searched through Summon. Orbis and MORRIS will be searched through Blacklight in a second phase of the Discovery project. Yale will first implement Blacklight as a discovery layer for digitized materials, and then later as the front end for searching MORRIS, Orbis, and Summon (e-articles) along with digitized materials.
Yale's Summon search is already being configured (but it isn't done yet) http://yale.summon.serialssolutions.com
· May 2013: Purchase Summon
· June and July 2013
o Implementation and testing of Summon
o By mid July Summon live for staff
· September 2013:
o Summon will be linked from www.library.yale.edu under the Find Articles link. It will mainly be a search of articles and some ebooks.
o Summon will power the database A to Z list of databases, with the front end in Libguides.
o Begin implementation of Blacklight as the front end for MORRIS and Orbis records, along with digitized materials, based on code available from Columbia as part of Clio Beta.
· December 2013: staff test Blacklight catalog search.
· January 2014: Blacklight becomes a beta search alternative for MORRIS and Orbis, live to the public.
· September 2014:
o Blacklight becomes the default search of MORRIS and Orbis, many Yale digitized resources and Summon articles, with a search box on www.library.yale.edu. Results are presented in a “bento box.” (Editorial note: See the Glossary for a definition of 'bento box'.)
Where to Learn More
· Serials Solution content coverage http://www.serialssolutions.com/en/services/summon/content
· Summon at Yale: Implementation news will be posted here
· Blacklight Project http://projectblacklight.org/
· Columbia Library Search includes Summon and Voyager catalog records
-- submitted by Katie Bauer, June 2013
In Focus: Hán Nôm Handwritten and Woodblock Manuscripts
Hán Nôm Handwritten and Woodblock Manuscripts: Digitization and Open Access
This year's third and final digitization project with Arcadia funding is the digitization of Hán Nôm Handwritten and Woodblock Manuscripts held in the Maurice Durand Collection.
Within this fascinating collection, a scholar can find handwritten and woodblock texts in Hán Nôm, a writing method for the Vietnamese language adapted from and incorporating modified Chinese characters in use from the 13th until the 20th century. The Maurice Durand collection contains Hán Nôm texts, which are divided into two groups: Series 1 contains 35 hand sewn woodblock or handwritten brush ink volumes; and Series 2 is made up of 169 bi-lingual parallel-script notebooks handwritten in fountain pen. These cover classical Vietnamese literature and historical texts in Hán Nôm and Quốc Ngữ, the modern Romanized Vietnamese script. Support from the Arcadia grant to the Southeast Asia Collection will provide for the digitization of both series and give patrons open access to the collection's distinctive content.
Special treatment of the originals has been necessary due to the rare and fragile nature of the original hand sewn woodblock and hand brushed texts in Series 1. The Northeast Document Conservation Center (NEDCC) in Andover, MA will provide careful dis-binding and digitization. YUL’s Preservation unit will conduct initial image quality control upon completion of the digitization process to ensure that each page has been digitized properly. After this review, NEDCC will rebind the material using the original Asian style of side sewing and rebinding.
Series 2 is in better physical condition than the materials in Series 1. The notebooks are not considered rare with the same antiquarian book value as the materials in Series 1. An outsourcing vendor will digitize this material. As with the Series 1 images, YUL’s Preservation unit complete a quality control check before turning the collection over for cataloging enhancement.
Hương Phan, a native Vietnamese speaker, will provide a more in depth quality control check, comparing the images to the original material. Additionally Ms. Phan will augment existing metadata that will be visible to scholars in the project’s open access collection made available via the Hydra digital repository that YUL IT is developing for the Arcadia funded projects.
[The first page from Bài Văn Sách, one of the handwritten notebooks in Series 2.]
-- submitted by Richard Richie, Elizabeth Beaudin, May 2013
In Focus: Digitization of Persian Titles
Persian Titles: Conservation and Digitization
Through the Arcadia grant, the South Asia Collection at Yale University is expanding its digital presence. Twenty Persian books will be cataloged, digitized, and preserved. Both the digital images and the online catalog records will be made accessible by the end of summer 2013.
While most Persian works are currently published in Iran, these 20 older Persian philology texts selected for this project either originate from India, are rare European translations, or are reprints. All of the selected items have fewer than 26 holdings listed in WorldCat, which places them all in the “endangered” category according to current Yale preservation standards. This unusual collection holds important information for any scholar interested in the literary aspects of the Mughal Empire in India or the history of Muslims in India.
To facilitate the process of cataloging, conserving, digitizing, and providing access, the South Asia Collection is collaborating with several library units including Preservation and Conservation, Cataloging, Yale University Library Information Technology (YUL IT), the Beinecke Rare Book and Manuscript Library, and the Near East Collection.
Thanks to the efforts in Cataloging, all twenty items have already been cataloged and their MARC records have been uploaded to WorldCat, allowing them to be moved from the older Yale classification system and into the current standard Library of Congress call number system. As a result, these books can be indexed for use in modern search interfaces.
Prior to digitization, YUL’s Preservation and Conservation units treated the books, and then sent nineteen of the books to a vendor to be digitized. The twentieth, now held at the Beinecke because of its age, will be digitized in-house. When the books return from being digitized, the YUL Conservation Lab will perform post-digitization treatment and then return the collection to the shelves for circulation. Just as the records are currently visible in WorldCat, the digital images will be preserved and made visible in the Hydra repository being constructed by YUL IT for Arcadia-funded projects and other digital collections.
[A sampling of the books to be digitized. Photo credit Sarah Calhoun.]
[Notes from Conservator: "Before conservation treatment" - book was disbound to facilitate image capture; pages, including manuscript leaves were collated; treated to remove previous damaging repairs and reduce staining; textblock will be reconstructed post-imaging and rebound or housed after consultation with the South Asian Studies Librarian.
Handwritten notes read: "Chronogram: Hafiz shed dazzling light upon his learned train / As a pure lamp with heav'nly rays supplied / Thrice fake thou from Mosella's Earth its richest grain / Where sleeps the poet, mark the year he died."]
-- submitted by Sarah Calhoun, Elizabeth Beaudin, April 2013
In Focus: Arabic and Persian Medical Texts
Arabic and Persian Medical Texts: Transmission of knowledge between civilizations
Medical knowledge created by Hippocrates, Galen, and other medical authorities traveled across time and space thanks to Arabic and Persian authors such as Ibn Sīna (Avicenna) and al-Rāzī (Rhazes) and others. These scholars from the past adopted, translated, and augmented Greek and Roman medical knowledge, transmitting this corpus to Western societies during the Renaissance. New medical practices such as inoculation against smallpox were also developed in Arabic countries and adopted by the British in the early 18th century.
Through digitization and dissemination made possible by the Arcadia fund a unique set of Arabic and Persian medical manuscripts and books, including early translations of Arabic works, will soon be available online. While Yale’s Medical Historical library is primarily known for its Western emphasis, its Arabic and Persian collection is relatively unknown to scholars. Building on a joint project between the Yale University Library and the School of Oriental and African Studies (SOAS), University of London, that digitized a small selection of Arabic and Persian manuscripts from the collection, the Arcadia-funded project will augment digitized materials already available in the Arabic and Middle Eastern Electronic Library (AMEEL) project (http://www.library.yale.edu/ameel/). The digitization in this Arcadia-funded phase includes 77 books and manuscripts containing 22,000 pages.
Along with other Arcadia grant projects employing digitization, the Medical Historical project is a case study for the new Digital Initiatives strategy to address policy and governance issues concerning digitization projects for the Library as a whole. For this reason, the Medical Historical library is collaborating with YUL’s Preservation department who will provide quality control of images, and with YUL IT as they implement the Hydra / Fedora repository that will house the images and metadata and make this valuable collection accessible via a Blacklight interface.
The Medical Historical library has a number of outreach efforts planned to publicize this project, such as sharing text with the Medical Heritage Library (MHL) collection at the Internet Archive (http://archive.org/details/medicalheritagelibrary), linking assets with AMEEL repository, and submitting images to the World Digital Library (http://www.wdl.org/en/).
An exhibit entitled “Unveiling Medicine’s Past: Medical Historical Collections Online,” opening in the Medical Library in April 2013, will showcase some of the items scanned for as part of this project, including Ab¯u Sa`¯id Al-Maghrib¯I ’s colorfully illustrated treasury.
A page from Ab¯u Sa`¯id Al-Maghrib¯I ’s illustrated treasury.
-- submitted by Melissa Grafe, Elizabeth Beaudin, March 2013
Hydra project management
Hydra project management
In this newsletter I thought I would share a behind the scenes look at the project management for Hydra.
For starters, we use Agile methodologies [http://en.wikipedia.org/wiki/Agile_management]. In software development, Agile project management allows you to react and adapt quickly to changing requirements while keeping focus on user based requirements. As a result we spend more time focusing on the user experience than planning the backend systems that will deliver the public interfaces. A simpler way to put it is that traditionally software shops can spend a great deal of amount of time in planning things they know how to do and spend little time thinking about the user experience. So in our Agile shop we assume we know how to do things like making data connections and creating software packages that talk to one another. We focus more on what are called User Stories [http://en.wikipedia.org/wiki/User_story], which are then used to plan and test the product we develop.
In general, the stories are very specific so that they isolate a set of related tasks assigned to a programming group. A simple story may go something like, “As a manager I will receive daily statistics on the rate of ingest into Hydra so that I can project the time it takes to ingest materials.” This story then translates to the programmers the need for statistics gathering and causes us to include additional systems for logging. This prevents us from thinking too much about implementation of a logging mechanism and focus more on display of logs to end users in a way that accomplishes something specific.
Agile is more of an umbrella of other tools we use such as: Lean, SCRUM, KanBan, TDD and Extreme Programming.
Lean is a software development practice we use as part of being Agile [http://en.wikipedia.org/wiki/Lean_software_development]. One of the debates of software shops, for more than thirty years, is the proper size and formation of a programming team. Some software development managers prefer a large team of ten or more programmers with a lot of overlap. I tend to be on the side where the maximum team size should be six. It may be helpful to offer some information about programmers. A slightly unfair, very stereotypical but pretty honest view of a programmer is someone who has a big chip on his/her shoulder and cannot be convinced his/herview is wrong (I fit this description very well). So when you have a large group with strong, inflexible opinions, work can drag out way too long or in many cases, never get done because it is stuck in a never ending planning process. In a lean shop this is fixed by establishing specific roles so there is little overlap in the need for opinions but the group is small so the need for collaboration is required. The benefits trickle down into the ability to deliver fast, eliminate waste and you end up with a team that works well together.
Regarding waste, one of the largest wastes I experience in software design amounts to creating applications, routines, or workflows that are not needed. Historically, software was developed using an approach where a spec for the complete package was written out and all parts designed before a single line of code is written. I used to use what is called the Bordeaux methodology where 70% of the time was planning, 15% programming and 15% debugging and re-programming. The term Bordeaux was coined in that most use a mix of grapes in the proportions: 70% cabernet sauvignon, 15% merlot and 15% cabernet franc. This methodology focused heavily on setting all the requirements up front for the project and could take twelve months or more to complete the project from start to finish. The problem is the amount of waste created was astronomical. By the time nine months went by, the end users forgot what they wanted, came up with new ideas on what they wanted and while the Bordeaux numbers seemed to hold, it became more like 50% planning, 20% programming and 30% going back and reprogramming everything – not to mention the typical deadline extension of six or more months.
But Lean alone does not solve the problems of scope creep and meeting deadlines. This is where SCRUM comes in. [http://en.wikipedia.org/wiki/Scrum_(development) ] Our development cycle may last eighteen months but SCRUM and Lean allow us to break this up into what are called sprints. Typically a sprint is a two week development session where the team drops everything and focuses on a set list of deliverables. This allows us to build a user focused solution that is planned, programmed and delivered in a short time span which in turn prevents all the scope creep and waste from the old Bordeaux methodology. While that is a huge benefit in itself, there is an even better outcome. In the two week time period we delivered working code on which we will plan our next two week development phase on top of while gathering the requirements at the latest point in time possible. This means that we deliver what is needed right now and not what we think will be needed a year from now.
As for SCRUM, it is really simple and can be applied to any type of project. We start with a post-it note for every task required to be completed in the two week period and post them all on a wall. They are organized into groups for tasks to be done, tasks in progress and completed tasks. In addition there is a section for tasks that are blocked and cannot be completed until something else is done. Each day we meet for fifteen minutes and move the post-it notes from one place to another while answering these three questions to the team: What did you do since the last stand-up, what will you do before the next stand-up and is anything blocking you. As the project manager (or in SCRUM, ScrumMaster) it is my job to make sure the tasks are being completed and if anything is blocked, it is my job to unblock it. When all the parts are put together, it means that the programming team can focus just on the task at hand for that day and all the noise and distractions which include “what do I do next” or “I don’t know what to do while I am waiting for something to happen” are taken away. In addition to completing the task list, at the end of a sprint we discuss what went right and what went wrong. So not only are we iteratively developing better software, we are also iteratively becoming a better programming team.
Project management does not do all the work to make a better software package. To accomplish this, you need some more tools; this is where TDD (Test Driven Development) and Extreme Programming come into play.
TDD [http://en.wikipedia.org/wiki/Test-driven_development] is a simple methodology, which in one respect makes bug-free code. Basically as a software developer you are asked to make the bits and bytes do something and for the most part we are successful at getting it right the first time around. But the pattern is typical: write a bunch of code and then test to see if it works. The change with TDD is that we write the tests we plan to perform after the code is written before we write the code. This creates two positive outcomes. First, in order to create the tests it means that the programmers fully understand the problem at hand. Second, the code created works the first time around.
TDD alone is not enough either. There is a risk large enough to require a backup system. A single developer in a TDD task could get wrapped up in an infinite loop of creating tests that no matter what will result in failure. Requiring to write a new test, new code and then fail again, and again, and again. This is where Extreme Programming comes in [http://www.extremeprogramming.org/].
It may seem contrary for a Lean programming shop to put two people on the same exact task. I even struggled at first understanding why the result of having two people on the same task resulted in taking less FTE hours than one person working on it alone. There are a number of reasons this works. Recall what I said earlier about programmers having some serious chips on their shoulders, when you pair them up to complete one thing, what starts as competition results in teamwork and a better product. Another common problem in software development is a single programmer getting stuck and needing another programmer to look and see what is missing. By pairing programmers, the second set of eyes on the code is already there and the likelihood of getting stuck is greatly reduced. Lastly, not all programmers are equal; some have strengths where others have weaknesses. Over time, pairing up programmers provides the sort of on the job training we need to improve our skills.
Wrapped up, all these methodologies help accomplish short term development cycles. So one last element of Agile is brought in, KanBan [http://en.wikipedia.org/wiki/Kanban]. This somewhat unlikely method, originally used to improve the manufacturing of Toyota automobiles in the early 1970’s, is exactly what has been missing from Agile software development shops for years. So for us we focus on a slightly different Wikipedia article which might make more sense here [http://en.wikipedia.org/wiki/Kanban_(development)].
The missing link that KanBan fills is the overall work scheduling process. In the Hydra implementation I’ve identified there are approximately twelve sprints we must complete. KanBan helps me to plan out when the sprints will take place. Similar to SCRUM, KanBan involves a bunch of post-it notes on a wall. The difference is that in SCRUM a post-it note may represent a single task that could take an hour or less. In KanBan the post-it note represents work units of any size, so a post-it could represent a task like fixing an application that broke or a series of sprints. For me, the system allows me to see with just a quick glance the tasks that represent our daily work and the tasks that will complete large projects.
This is just a brief overview of the Agile methods used by the Hydra implementation team. I see it more as a change to how we work than just us adopting a new system to manage our work. The result allows us to adapt quickly to user needs and shorten our development cycles to deliver features and upgrades quickly.
-- submitted by Michael Friscia, February 2013
Hydra: a digital repository solution for YUL
Under the leadership of Michael Dula, the Digital Infrastructure Deployment Working Group has selected Hydra as our new digital repository solution.
Hydra is an open source solution for handling the presentation and preservation of digital assets which bundles a large number of open source software packages using Ruby on Rails to tie everything together. At its core are the Fedora Commons repository software for preservation and Blacklight for presentation of the digitized materials.
Ruby on Rails is a dynamic web application framework similar to but more robust and secure than PHP which offers rapid application development and follows modern programming design patterns like Model-View-Controller and Convention over Configuration. Combined with the existing Hydra components, Rails allows rapid application development design principles to move ideas into production faster than other web languages.
In becoming a Hydra adopter we join a multi-institutional collaboration working on solutions for the same problems we all face with handling digitized materials. A partial list of partners includes Stanford University, University of Virginia, Columbia University, Penn State University and Northwestern University (a full list can be found here: http://projecthydra.org/community-2-2/partners-and-more/).
Besides the benefit of having a large community of developers working on the same problems we face here at Yale, we also benefit from working within a common platform that has gone through significant testing and is now in production as the repository solution at a number of institutions. Some notable implementations include the new Stanford Library search, SearchWorks (http://searchworks.stanford.edu/), the Rock and Roll Hall of Fame (http://library.rockhall.com/home) and University of Virginia, Libra (http://libra.virginia.edu/).
In the coming year we will begin implementing Hydra as a digital repository that will ultimately replace Rescue Repository as well as many of our digital collection interfaces. This transition will start with the Year 4 Arcadia digitization projects and several selected collections but eventually will include most library digital collections. While we have not set firm dates, we expect to have our web front end to Hydra ready for use next summer. Our first rounds of ingest are being handled primarily by Ladybird, which will be one of several options for ingest into our Hydra implementation.
--submitted by Michael Friscia, November 2012