Emory Libraries has been awarded a 2023 Lyrasis Catalyst Fund Grant. The Catalyst Fund is an award program that provides support for new ideas and innovative projects from members of Lyrasis, a non-profit member organization serving the global landscape of academic and public libraries, scholarly research, archives, museums, and galleries. With a theme of community-driven projects and projects with community impact, the Catalyst Fund is now in its seventh year of investing more than $800,000 in projects to support and promote scalable innovation for the benefit of the community at large.
Emory’s project, entitled “Increasing Accessibility of Audiovisual Content Using Whisper,” is one of five projects chosen for the 2023 award. It will assess the viability of an open-source AI software tool as a solution to captioning and transcribing audiovisual content in Emory Libraries collections. The result will increase discoverability, searchability, and accessibility of these materials for all users.
Many libraries and universities share the challenge of ensuring that digitized audiovisual content is accessible to all audiences. Captioning digitized AV content on a proactive basis improves accessibility for end users by supporting diversity, equity, and inclusion while bringing the content closer into compliance with Web Content Accessibility Guidelines. Providing transcripts of AV content in collections also improves searchability and discoverability, thereby increasing access to underrepresented voices and communities within collections.
Our project will be led by audiovisual conservator Nina Rao, head of Digitization Services Kyle Fenton, and head of Metadata Services Simon O’Riordan with guidance from Emory University Archives Oral History coordinator Jonathan Coulis. Emory’s Catalyst Fund project will test Whisper, a recently released open-source AI software tool, assessing its viability as a solution to this challenge. Whisper performs multilingual speech recognition and speech translation while locally maintaining all content and data, a key feature for data privacy concerns. The project will test across a large sample of AV content to assess the accuracy of Whisper and determine the feasibility of deploying it within Emory’s AV digitization workflow.
Emphasizing community-driven collections, the overall performance of Whisper and equity of performance will be tested on content representing a multitude of contexts. This includes technical and subject-specific vocabularies, regional dialects, multi-speaker content, environmental noise, and a range of sound and production qualities. To carry out this innovative work, the project team will have several Emory student assistants edit and prepare captions, then test the synchronization of captioned and transcribed content.
The success of this project will advance the discoverability and accessibility of content, making digitized AV content accessible and text-searchable. It will also provide a model for addressing this challenge at Emory as well as adopting proactive captioning and transcription practices at academic libraries, special collections, research facilities, museums, and archives of all sizes.
—Nina Rao, audiovisual conservator, Access and Resources Division, Emory Libraries