In the humanities and cultural studies, OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) remain difficult tasks. All users have access to a free and simple-to-use tool through OCR4all to carry out their own OCR workflows. The fundamental ideas and concepts of OCR will be covered in this workshop, along with a brief overview of the OCR4all program.
- What kinds of files and data are necessary for OCR?
- How does the OCR or HTR workflow integration in OCR4all adapt according to the source material and the anticipated (human) effort?
- With regard to the content at hand, how much of the workflow can be automated?
- What is an OCR model, and how can one train a specific text recognition model?
- What level of recognition accuracy can be expected?
- How much work should be put into producing texts if they are going to be used later?
By the end of the session, all participants will be able to work independently on challenging OCR tasks thanks to the discussion and explanation of these and other topics.
The participants may use the offered sample texts as well as their own materials. There is no prerequisite for this training, and all skill levels can participate.
Speaker: Florian Langhanki (JMU)
The number of participants is limited to 15, so please register at forschungsdienste@sub.uni-hamburg.de.
This event is in the series "Digital Humanities – How does it work?" of the Department for Digital Scholarship Services.
Universität Hamburg
Adeline Scharfenberg
Universität Hamburg
Adeline Scharfenberg
Universität Hamburg
Adeline Scharfenberg