Examining historical sources in the form of printed and manuscript textual material is a crucial component of the work done by scholars in the humanities, as well as in the cultural and human sciences. These are frequently only accessible as scans, which drastically restricts how useful they may be because automatic indexing techniques like full-text search or quantitative analysis methods cannot be applied. The so-called machine-processable full text must first be extracted from the digitized data for this purpose, and methods for automatic text recognition of prints (OCR) or manuscripts (HTR) are becoming increasingly crucial in this process. Old prints and manuscripts, in particular, can still be exceedingly difficult to work with for a variety of reasons. Fortunately, historical OCR/HTR has made significant strides in recent years, leading to the development of some high-performance solutions.
OCR4all, a freely downloadable open source program created by the University of Würzburg's Center for Philology and Digitization (ZPD), seeks to make it possible for users of all skill levels to independently and accurately index complex printed materials and manuscripts. OCR4all is a single application that includes the whole text recognition workflow as well as all necessary tools. It is simple to install and use because to its user-friendly graphical user interface.
In addition to introducing OCR4all and its features through a live demonstration, the lecture goes through the fundamentals of automatic text recognition. Additionally, the performance and application on various materials will be shown, and an overview of recent work as well as a prognosis for future advances will be provided.
Speaker: Christian Reul
This event is in the series "Digital Humanities – How does it work?" of the Department for Digital Scholarship Services.