In the century between the 1870s and the 1970s, hundreds of Malay-language periodicals circulated around the Malay-speaking world. These periodicals chronicle a fascinating era and have been the focus of intense study by scholars such as William Roff and Ian Proudfoot. Many of these periodicals have been digitized, and comprehensive collections are at the National Library of Singapore, as well as in other libraries and archives.
Given the availability and size of the collections, the opportunity is ripe for systematic digital analysis. Projects elsewhere in the world have demonstrated the power of analysing historical newspapers at scale using computational methods. Examples include "Living With Machines" (a partnership between the British Library and several universities in the UK) and "Oceanic Exchanges" (a partnership between Finland, Germany, Mexico, the Netherlands, the United Kingdom, and the United States).
What is holding back a similar study of Malay-language newspapers?
– The main obstacle is the script. The majority of these periodicals were published in Jawi, an adaptation of the Perso-Arabic script for the Malay language, which poses significant challenges for digital processing. For one thing, typical Optical Character Recognition pipelines (OCR) don't work well for Jawi.
– Another challenge is that most contemporary Malay readers, including many historians who would be interested in these collections, are less familiar with Jawi than with Rumi (the Romanized version of Malay most commonly used today). The automatic transliteration from Jawi to Rumi is also a complex task, as vowels are often not marked down in Jawi.
– In addition to this, spelling conventions have changed, and there are many different approaches for transliterating the same word.
To address these challenges, the "Computational Heritage" research group at the National University of Singapore has developed specialized AI models for both Jawi OCR and Jawi-to-Rumi transliteration. In this talk, I will describe our progress so far, the challenges we still face and the future directions of our work.
Institutions
Universität Hamburg
Adeline Scharfenberg
Universität Hamburg
Adeline Scharfenberg
Universität Hamburg
Adeline Scharfenberg