Scholars working with historical documents know the value of a clean transcription. Being able to easily read and search for text in handwritten or early print documents opens up new possibilities for research and teaching. While reliable transcriptions can be labor-intensive to create, modern tools like Transkribus can greatly accelerate the process.
Transkribus is a platform that leverages artificial intelligence (AI) to recognize text in both handwritten and printed documents and provides a robust interface for manual transcription. AI models for various languages are available and it’s even possible to train your model to recognize a particular language or style of handwriting. The platform offers two products: the fully-featured Transkribus desktop tool and the web-based interface Transkribus Lite. The desktop tool is necessary for text recognition while Transkribus Lite allows users to transcribe manually in a browser.
Most features of Transkribus, including model training, layout analysis and manual transcription, are completely free. Only text recognition has a cost and the pricing depends on the recognition engine and models used. New users receive 500 free credits which can handle approximately 400-500 handwritten pages or 2,500-3,000 printed pages. If you are interested in learning how to use Transkribus, explore their helpful series of how-to guides.
Part of what makes Transkribus a compelling platform is its attention to ergonomics. The desktop interface is made up of three main panels (administrative, page image and transcription) that can be hidden or resized as needed for specific tasks. The page image panel zooms and pans intuitively allowing deep engagement with the source material. The thickness and color of the layout bounding boxes are fully customizable to maximize readability. Buttons are laid out sensibly and efficiently. It is a well-designed tool that facilitates a pleasant working experience.
Another important feature of Transkribus is its capacity for collaboration. Documents can be organized into collections and shared with other users. All documents are maintained on Transkribus’s servers and can be accessed in the desktop tool or Transkribus Lite. This means it’s possible to distribute work to team members who may not be able to install the desktop software on their machines for administrative or technical reasons.
At the Emory Center for Digital Scholarship (ECDS), we are using Transkribus to generate data about the content and spatial position of text in a variety of documents. We then export this data so that we can overlay text on page images in Readux, a platform for reading, annotation and publishing. We have processed handwritten ledgers from the David J. Sencer CDC Museum, handwritten copybooks of the African-American poet Phillis Wheatley as well as printed 19th and 20th-century songbooks for the Sounding Spirit initiative.