This workshop may appeal to you if you are a language activist, work in collaboration with language activists or with low-resource languages.
As part of our research, we currently aim to provide NLP tools and models tailored to language organisations and communities.
In this regard, we are pleased to invite you or language activists from your network to our first 2-hour online workshop on tools for low-resource languages. We are currently focusing on languages for which some digital texts are available.
This work is part of an ERC Proof of Concept Grant, which focuses on creating tools for language activists. Additionally, our research group has recently received an ERC Advanced Grant to develop Large Language Models for languages with less digital resources. Both grants enable more long-term collaboration with language communities.
This first session will be in two parts:
- we will present our tool for parallel sentence mining (i.e., finding translation pairs among two monolingual corpora) for low-resource languages. This task constitutes an essential step towards developing a dedicated machine translation system or enabling a large language model to support your language. Details on the tool can be found at https://lnkd.in/ejHBwTDT
- we will show the diversity of possible NLP tools that could be extended to other languages. These will range from spell checkers to speech recognition, but with a strong focus on machine translation and chatbots.
If you are interested in attending the workshop or would like to stay in touch for future updates, please fill out this form: https://lnkd.in/eeiThqgg.
Join Zoom Meeting
Meeting ID: 651 5946 8381
Passcode: 178965
Institutions
This workshop may appeal to you if you are a language activist, work in collaboration with language activists or with low-resource languages.
As part of our research, we currently aim to provide NLP tools and models tailored to language organisations and communities.
In this regard, we are pleased to invite you or language activists from your network to our first 2-hour online workshop on tools for low-resource languages. We are currently focusing on languages for which some digital texts are available.
This work is part of an ERC Proof of Concept Grant, which focuses on creating tools for language activists. Additionally, our research group has recently received an ERC Advanced Grant to develop Large Language Models for languages with less digital resources. Both grants enable more long-term collaboration with language communities.
This first session will be in two parts:
- we will present our tool for parallel sentence mining (i.e., finding translation pairs among two monolingual corpora) for low-resource languages. This task constitutes an essential step towards developing a dedicated machine translation system or enabling a large language model to support your language. Details on the tool can be found at https://lnkd.in/ejHBwTDT
- we will show the diversity of possible NLP tools that could be extended to other languages. These will range from spell checkers to speech recognition, but with a strong focus on machine translation and chatbots.
If you are interested in attending the workshop or would like to stay in touch for future updates, please fill out this form
Join Zoom Meeting / Meeting ID: 684 1114 2377 / Passcode: 025895
Institutions
As we had a very positive echo of the Summer School 2023 “Data Science for Explainable and Trustworthy AI”, another one will be organised in 2024 on the topic of “Generative AI”.
The rise of Generative AI, especially with the advancements in Large Language Models (LLMs), marks a transformative era in artificial intelligence that is expanding across all disciplines. LLMs aim to bridge the communication gap between machines and humans, paving the way for models that can grasp the nuances of human language and generate outputs in various formats that mimic human cognition and creativity.
The critical moment for Generative AI came with the adoption of neural networks, particularly transformer-based architectures, which have become its backbone. These models stand out for their profound ability to digest and learn from extensive corpora and datasets, and also to generate original, contextually rich content. But we are just at the beginning. The emerging models present challenges related to ethics, reliability, the way we experiment with these models, the scope of their inferences, their applications to more specific domains, etc. All this has created a vibrant field of work and opens the doors to a community that we hope will find the right forum in this Summer School.
Institutions
Verfahren des maschinellen Lernens im Kontext der Sprachverarbeitung sind momentan in aller Munde. Noch ist unklar, wie und wo genau Systeme wie etwa ChatGPT in der Forschung zum Einsatz kommen werden. Schon lange werden jedoch, auch in den Digital Humanities, mit regel-basierten und statistischen Verfahren Texte automatisiert analysiert. Für Forschende bleibt es wichtig ein Verständnis der Methoden zu entwickeln, um so jeweils die passende Technik zur Anwendung zu bringen und dabei insbesondere die Schwächen der Methoden zu berücksichtigen.
In seinem Vortrag beleuchtet Hans Ole Hatzel zunächst die Grundlagen der computergestützten Textverarbeitung und erklärt dabei von Tokens und Types bis hin zu Word Embeddings und Sentiment Analyse unterschiedliche etablierte Techniken. Einige Verfahren werden mit Beispielen aus den Digital Humanities hinterlegt, um neben den Methoden selbst auch zu verdeutlichen, wie sie konkret Anwendung finden. Am Schluss folgt ein Ausblick auf die Verwendung von Large Language Models, der Technologie hinter ChatGPT, in den Digital Humanities.
Referent:in: Hans Ole Hatzel (UHH)
Universität Hamburg
Adeline Scharfenberg
Universität Hamburg
Adeline Scharfenberg
Universität Hamburg
Adeline Scharfenberg