This workshop may appeal to you if you are a language activist, work in collaboration with language activists or with low-resource languages.
As part of our research, we currently aim to provide NLP tools and models tailored to language organisations and communities.
In this regard, we are pleased to invite you or language activists from your network to our first 2-hour online workshop on tools for low-resource languages. We are currently focusing on languages for which some digital texts are available.
This work is part of an ERC Proof of Concept Grant, which focuses on creating tools for language activists. Additionally, our research group has recently received an ERC Advanced Grant to develop Large Language Models for languages with less digital resources. Both grants enable more long-term collaboration with language communities.
This first session will be in two parts:
- we will present our tool for parallel sentence mining (i.e., finding translation pairs among two monolingual corpora) for low-resource languages. This task constitutes an essential step towards developing a dedicated machine translation system or enabling a large language model to support your language. Details on the tool can be found at https://lnkd.in/ejHBwTDT
- we will show the diversity of possible NLP tools that could be extended to other languages. These will range from spell checkers to speech recognition, but with a strong focus on machine translation and chatbots.
If you are interested in attending the workshop or would like to stay in touch for future updates, please fill out this form: https://lnkd.in/eeiThqgg.
Join Zoom Meeting
Meeting ID: 651 5946 8381
Passcode: 178965
Institutions
This workshop may appeal to you if you are a language activist, work in collaboration with language activists or with low-resource languages.
As part of our research, we currently aim to provide NLP tools and models tailored to language organisations and communities.
In this regard, we are pleased to invite you or language activists from your network to our first 2-hour online workshop on tools for low-resource languages. We are currently focusing on languages for which some digital texts are available.
This work is part of an ERC Proof of Concept Grant, which focuses on creating tools for language activists. Additionally, our research group has recently received an ERC Advanced Grant to develop Large Language Models for languages with less digital resources. Both grants enable more long-term collaboration with language communities.
This first session will be in two parts:
- we will present our tool for parallel sentence mining (i.e., finding translation pairs among two monolingual corpora) for low-resource languages. This task constitutes an essential step towards developing a dedicated machine translation system or enabling a large language model to support your language. Details on the tool can be found at https://lnkd.in/ejHBwTDT
- we will show the diversity of possible NLP tools that could be extended to other languages. These will range from spell checkers to speech recognition, but with a strong focus on machine translation and chatbots.
If you are interested in attending the workshop or would like to stay in touch for future updates, please fill out this form
Join Zoom Meeting / Meeting ID: 684 1114 2377 / Passcode: 025895
Institutions
As we had a very positive echo of the Summer School 2023 “Data Science for Explainable and Trustworthy AI”, another one will be organised in 2024 on the topic of “Generative AI”.
The rise of Generative AI, especially with the advancements in Large Language Models (LLMs), marks a transformative era in artificial intelligence that is expanding across all disciplines. LLMs aim to bridge the communication gap between machines and humans, paving the way for models that can grasp the nuances of human language and generate outputs in various formats that mimic human cognition and creativity.
The critical moment for Generative AI came with the adoption of neural networks, particularly transformer-based architectures, which have become its backbone. These models stand out for their profound ability to digest and learn from extensive corpora and datasets, and also to generate original, contextually rich content. But we are just at the beginning. The emerging models present challenges related to ethics, reliability, the way we experiment with these models, the scope of their inferences, their applications to more specific domains, etc. All this has created a vibrant field of work and opens the doors to a community that we hope will find the right forum in this Summer School.
Institutions
The 2025 edition of the EuADS Summer School is dedicated to Automated Data Science (AutoDS) and will cover important branches of this research field in a tutorial style. With the increasing complexity of data science projects and the limited availability of human expertise, the idea of automating or partially automating the work of a data scientist has come to the fore in recent years. AutoDS aims to streamline the data science workflow, making processes such as data pre-processing, feature engineering, model selection, evaluation and deployment faster and more accessible. By reducing manual intervention, AutoDS enables both non-experts and data scientists to work more efficiently, scale projects, and make data science accessible to a broader audience. It leverages tools from automated machine learning (AutoML) frameworks, automated visualisation and interpretability techniques to enable efficient model tuning, robust evaluation and easy deployment. Despite its advantages in efficiency and scalability, challenges remain in automating subtasks that are context-dependent and require human interaction, as well as model interpretability, dependence on data quality, and ethical concerns related to bias in automated models. These and other issues will be addressed in a series of five tutorials delivered by leading experts in the field.
The Summer School emphasizes the interdisciplinary nature of data science and is primarily aimed at PhD students, postdoctoral and early-career researchers with a basic grounding in data science, statistics, machine learning, AI, or related fields, and an interest in interdisciplinary research and applications.
Institutions
Verfahren des maschinellen Lernens im Kontext der Sprachverarbeitung sind momentan in aller Munde. Noch ist unklar, wie und wo genau Systeme wie etwa ChatGPT in der Forschung zum Einsatz kommen werden. Schon lange werden jedoch, auch in den Digital Humanities, mit regel-basierten und statistischen Verfahren Texte automatisiert analysiert. Für Forschende bleibt es wichtig ein Verständnis der Methoden zu entwickeln, um so jeweils die passende Technik zur Anwendung zu bringen und dabei insbesondere die Schwächen der Methoden zu berücksichtigen.
In seinem Vortrag beleuchtet Hans Ole Hatzel zunächst die Grundlagen der computergestützten Textverarbeitung und erklärt dabei von Tokens und Types bis hin zu Word Embeddings und Sentiment Analyse unterschiedliche etablierte Techniken. Einige Verfahren werden mit Beispielen aus den Digital Humanities hinterlegt, um neben den Methoden selbst auch zu verdeutlichen, wie sie konkret Anwendung finden. Am Schluss folgt ein Ausblick auf die Verwendung von Large Language Models, der Technologie hinter ChatGPT, in den Digital Humanities.
Referent:in: Hans Ole Hatzel (UHH)
Artificial Intelligence is transforming how we approach chemical research and synthesis. By teaching language models to understand and generate the language of chemistry, we have developed complementary AI systems that bridge the gap between computational design and experimental reality.
Our large language model system, ChemCrow, represents one of the first demonstrations of an AI system directly controlling robotic synthesis platforms, successfully executing the synthesis of compounds including organocatalysts and chromophores.
Complementing this, our small language model system, Saturn, currently the most sample-efficient molecular design algorithm, enables precise molecular generation with built-in synthesizability constraints. Saturn’s innovations include direct optimization against retrosynthetic predictions and integration of building block availability, ensuring that generated molecules are practically accessible.
Our work demonstrates how different scales of language models can work together to transform chemical research, from initial molecular design through to physical synthesis, potentially revolutionizing drug discovery, catalysis, and materials development.
Institution
Universität Hamburg
Adeline Scharfenberg
Universität Hamburg
Adeline Scharfenberg
Universität Hamburg
Adeline Scharfenberg