EVENTS

Our events in the areas of Big Data and Research Innovation include a diverse set of topics such as Future, Strategy, Technology, Applications, and Management.

If you feel that your event or event series should be part of this event calendar, just contact us!

images/02_events/Towards%20machine.png#joomlaImage://local-images/02_events/Towards machine.png?width=800&height=300
Tuesday, November 25th, 2025 | 16:00 - 18:00 p.m.

Lecture (in-site): "Towards machine-readable Jawi Newspapers using bespoke AI models"

Asia-Africa-Institute (AAI), room O-222

In the century between the 1870s and the 1970s, hundreds of Malay-language periodicals circulated around the Malay-speaking world. These periodicals chronicle a fascinating era and have been the focus of intense study by scholars such as William Roff and Ian Proudfoot. Many of these periodicals have been digitized, and comprehensive collections are at the National Library of Singapore, as well as in other libraries and archives.
Given the availability and size of the collections, the opportunity is ripe for systematic digital analysis. Projects elsewhere in the world have demonstrated the power of analysing historical newspapers at scale using computational methods. Examples include "Living With Machines" (a partnership between the British Library and several universities in the UK) and "Oceanic Exchanges" (a partnership between Finland, Germany, Mexico, the Netherlands, the United Kingdom, and the United States).

What is holding back a similar study of Malay-language newspapers?
  – The main obstacle is the script. The majority of these periodicals were published in Jawi, an adaptation of the Perso-Arabic script for the Malay language, which poses significant challenges for digital processing. For one thing, typical Optical Character Recognition pipelines (OCR) don't work well for Jawi.
  – Another challenge is that most contemporary Malay readers, including many historians who would be interested in these collections, are less familiar with Jawi than with Rumi (the Romanized version of Malay most commonly used today). The automatic transliteration from Jawi to Rumi is also a complex task, as vowels are often not marked down in Jawi. 
  – In addition to this, spelling conventions have changed, and there are many different approaches for transliterating the same word.

To address these challenges, the "Computational Heritage" research group at the National University of Singapore has developed specialized AI models for both Jawi OCR and Jawi-to-Rumi transliteration. In this talk, I will describe our progress so far, the challenges we still face and the future directions of our work.

Institutions

  • Department for Languages and Cultures of Southeast Asia, Asia-Africa-Institute (AAI), UHH

Universität Hamburg
Adeline Scharfenberg
Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein. 

Universität Hamburg
Adeline Scharfenberg
Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein. 

Universität Hamburg
Adeline Scharfenberg
Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein.