natural language processing

Events

Tuesday, February 04th, 2025 | 18:00 p.m.

Low-res lang tools - workshop on NLP tools for language communities

Online, registration is required

This workshop may appeal to you if you are a language activist, work in collaboration with language activists or with low-resource languages.

As part of our research, we currently aim to provide NLP tools and models tailored to language organisations and communities.
In this regard, we are pleased to invite you or language activists from your network to our first 2-hour online workshop on tools for low-resource languages. We are currently focusing on languages for which some digital texts are available.

This work is part of an ERC Proof of Concept Grant, which focuses on creating tools for language activists. Additionally, our research group has recently received an ERC Advanced Grant to develop Large Language Models for languages with less digital resources. Both grants enable more long-term collaboration with language communities.

This first session will be in two parts:
- we will present our tool for parallel sentence mining (i.e., finding translation pairs among two monolingual corpora) for low-resource languages. This task constitutes an essential step towards developing a dedicated machine translation system or enabling a large language model to support your language. Details on the tool can be found at https://lnkd.in/ejHBwTDT
- we will show the diversity of possible NLP tools that could be extended to other languages. These will range from spell checkers to speech recognition, but with a strong focus on machine translation and chatbots.
If you are interested in attending the workshop or would like to stay in touch for future updates, please fill out this form: https://lnkd.in/eeiThqgg.

Join Zoom Meeting
Meeting ID: 651 5946 8381
Passcode: 178965

Institutions

TUM Heilbronn

Tags machine learning, natural language processing, big data, large language models, workshop

Thursday, February 06th, 2025 | 10:00 a.m.

Low-res lang tools - workshop on NLP tools for language communities

Online, registration is required

This workshop may appeal to you if you are a language activist, work in collaboration with language activists or with low-resource languages.

As part of our research, we currently aim to provide NLP tools and models tailored to language organisations and communities.
In this regard, we are pleased to invite you or language activists from your network to our first 2-hour online workshop on tools for low-resource languages. We are currently focusing on languages for which some digital texts are available.
This work is part of an ERC Proof of Concept Grant, which focuses on creating tools for language activists. Additionally, our research group has recently received an ERC Advanced Grant to develop Large Language Models for languages with less digital resources. Both grants enable more long-term collaboration with language communities.

Institutions

TUM Heilbronn

Tags machine learning, natural language processing, big data, large language models, workshop

images/02_events/AI%20for%20good.png#joomlaImage://local-images/02_events/AI for good.png?width=800&height=300

Monday, May 19th, 2025 | 18:00 - 19:00 p.m

Foundation models for wireless communications and sensing

online

This talk presents the Large Wireless Model (LWM), the world’s first foundation model for wireless channels. Inspired by the success of foundation models in NLP, speech, and vision, LWM is a transformer-based model pre-trained in a self-supervised fashion on large-scale diverse wireless datasets. It learns rich, universal contextualized channel embeddings (features) that potentially enhance performance across a wide range of downstream tasks. I will present the model’s architecture, its self-supervised pre-training approach, and training datasets. I will also demonstrate its gains in tasks like sub-6GHz to mmWave beam prediction, LoS/NLoS classification, and localization. These gains highlight the LWM’s ability to learn from large-scale wireless data and enable complex machine learning tasks with limited data in wireless communication and sensing systems.

Finally, we introduce an ITU AI/ML 5G competition which provides a modular setup, where participants can innovate on scenario design, feature extraction, and lightweight downstream models, pushing the frontiers of robustness, generalizability, and interpretability. By contributing improved scores and model refinements, the challenge also opens doors for discussion on formats, reproducible simulations, and alignment with 6G use cases. The outcomes are expected to influence real-world deployments, research reproducibility, and standard frameworks for wireless AI.

Learning Objectives:

Describe the architecture and self-supervised training approach of the Large Wireless Model (LWM).
Explain how LWM generates contextualized channel embeddings and how they contribute to wireless communication and sensing tasks.
Analyze the performance of LWM in downstream tasks such as beam prediction, LoS/NLoS classification, and localization.
Evaluate the role of large-scale data and foundation models in improving generalizability and efficiency in wireless AI applications.
Design innovative approaches for feature extraction or scenario modeling and apply them in the ITU AI/ML 5G challenge.

Institutions

AI for Good

Tags natural language processing, ai, large wireless model, wireless datasets

images/02_events/digital%20humanitis.jpg#joomlaImage://local-images/02_events/digital humanitis.jpg?width=1200&height=450

Monday, August 7th, 2023 | 9:45 - 17:00 p.m

Named Entity Recognition für Geisteswissenschaftler:innen mit Stanford CoreNLP

Staats- und Universitätsbibliothek, Raum BT17a

Wie können wiederkehrende Einheiten, wie Personennamen oder Titel literarischer Werke in großen Textkorpora automatisch ausfindig und annotiert werden? Wie kann eine erste inhaltliche Erschließung literarischer Texte digital umgesetzt werden und auf welche Art und Weise lassen sich Verfahren des maschinellen Lernens für geisteswissenschaftliche Forschungsszenarien fruchtbar machen?

Diesen und anderen Fragen, die mit dem Einsatz digitaler Verfahren der Textanalyse einhergehen, werden wir im Rahmen des Workshops “Named Entity Recognition für Geisteswissenschaftler:innen mit Stanford CoreNLP” nachgehen. Dabei lernen Sie ein ausgewähltes Tool im Hands-On-Modus kennen, das in den Digital Humanities zur sog. Named Entity Recognition, also: die automatische Klassifikation/Annotation wiederkehrender Entitäten wie Personen, Werke, Orte und Organisationen, eingesetzt wird. Neben einer kurzen inhaltlichen Einführung in die Named Entity Recognition steht vor allem die praktische Anwendung der Methode im Vordergrund.

Dabei können Sie entweder direkt mit eigenen Texten arbeiten oder auf vorbereitete Materialien zurückgreifen. Technische Vorkenntnisse sind für die Teilnahme nicht vonnöten. Bringen Sie lediglich einen internetfähigen Laptop, für Ihre Forschung relevante Texte (optional) und eine große Portion Neugier auf digitale Verfahren der Textanalyse mit.

Referent:in: Marie Flüh (UHH). Die Teilnehmer:innenzahl ist beschränkt auf 15, daher wird um Anmeldung an forschungsdienste@sub.uni-hamburg.de gebeten.

Institutions

Referat für Digitale Forschungsdienste, State and University Library Hamburg Carl von Ossietzky

Tags digital humanities, natural language processing, hands on, named entity recognition

images/02_events/ML%20Sprachverarbeitung%20SUB.jpg#joomlaImage://local-images/02_events/ML Sprachverarbeitung SUB.jpg?width=800&height=300

Wendsday, August 16th, 2023 | 17:00 - 18:30 p.m

Natural Language Processing für Digital Humanities - Grundlagen und neuste Entwicklungen

Staats- und Universitätsbibliothek, Raum BT17a

Verfahren des maschinellen Lernens im Kontext der Sprachverarbeitung sind momentan in aller Munde. Noch ist unklar, wie und wo genau Systeme wie etwa ChatGPT in der Forschung zum Einsatz kommen werden. Schon lange werden jedoch, auch in den Digital Humanities, mit regel-basierten und statistischen Verfahren Texte automatisiert analysiert. Für Forschende bleibt es wichtig ein Verständnis der Methoden zu entwickeln, um so jeweils die passende Technik zur Anwendung zu bringen und dabei insbesondere die Schwächen der Methoden zu berücksichtigen.

In seinem Vortrag beleuchtet Hans Ole Hatzel zunächst die Grundlagen der computergestützten Textverarbeitung und erklärt dabei von Tokens und Types bis hin zu Word Embeddings und Sentiment Analyse unterschiedliche etablierte Techniken. Einige Verfahren werden mit Beispielen aus den Digital Humanities hinterlegt, um neben den Methoden selbst auch zu verdeutlichen, wie sie konkret Anwendung finden. Am Schluss folgt ein Ausblick auf die Verwendung von Large Language Models, der Technologie hinter ChatGPT, in den Digital Humanities.

Referent:in: Hans Ole Hatzel (UHH)

Institutions

Tags digital humanities, natural language processing, chatGPT, large language models

Thursday, May 22th 2025 | 14:30 - 16:00 p.m

Optimizing zero-shot segmentation of Remote Sensing imagery using LangRS: A hands-on workshop

online

LangRS is a Python library designed to enable the use of natural language processing (NLP) and Remote Sensing (RS) imagery segmentation, it is built on top of Segment Anything Geospatial (SamGEO) and deploys techniques that improve upon it. This hands-on workshop introduces LangRS, showcasing its potential to optimize zero-shot segmentation potentials through the pre-processing and post-processing techniques.

Participants will explore how to use LangRS to segment RS imagery. The workshop includes interactive demonstrations and practical exercises covering:

Introduction to LangRS: Understand the core functionalities and architecture of LangRS and its applications in geospatial analysis.
Data Preparation: Learn to preprocess and structure remote sensing data for seamless integration with LangRS
Hands-On with LangRS: Utilize LangRS to segment and analyze satellite imagery through natural language prompts.
Advanced Applications: Explore complex use cases such as land cover classification, and object extraction using LangRS.
Visualization Techniques: Visualize geospatial outputs and insights derived from LangRS to support data-driven decision-making.

By the end of the workshop, participants will gain practical experience in using LangRS to enhance geospatial data analysis workflows, making geospatial insights more accessible and actionable.

Target Audience

This workshop is ideal for geospatial data scientists, remote sensing analysts, computer vision researchers, and professionals interested in integrating AI with geospatial data.

Prerequisites

A Google Colab account.
Basic understanding of Python programming and geospatial data concepts is recommended.

Institutions

AI for Good

Tags natural language processing, ai, hands on, python, zero-shot segmentation, remote sensing, segment anything geospatial

Monday, May 27th, 2024 | 11:00 - 12:00 a.m.

Research talk of Minh Duc Bui

Von-Melle-Park 5, 20146 Hamburg, Room 3126

Part 1: "Adapter Fairness": "Current natural language processing (NLP) research tends to focus on only one or, less frequently, two dimensions -- e.g., performance, privacy, fairness, or efficiency -- at a time, which may lead to suboptimal conclusions and often overlooking the broader goal of achieving trustworthy NLP. Work on adapter modules focuses on improving performance and efficiency, with no investigation of unintended consequences on other aspects such as fairness. To address this gap, we conduct experiments on three text classification datasets by either (1) finetuning all parameters or (2) using adapter modules."

Part 2: "Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget”: "Compared to standard language model (LM) pretraining (i.e., from scratch), Knowledge Distillation (KD) entails an additional forward pass through a teacher model that is typically substantially larger than the target student model. As such, KD in LM pretraining materially slows down throughput of pretraining instances vis-a-vis pretraining from scratch. Scaling laws of LM pretraining suggest that smaller models can close the gap to larger counterparts if trained on more data (i.e., processing more tokens)—and under a fixed computation budget, smaller models are able be process more data than larger models. We thus hypothesize that KD might, in fact, be suboptimal to pretraining from scratch for obtaining smaller LMs, when appropriately accounting for the compute budget.”

Part 3: Most likely, Duc will also discuss the ideas we have for his research stay with us (~Cross-cultural Hate Speech). Feedback is highly welcome!

Short Bio

I'm a PhD student at JGU Mainz, advised by Katharina von der Wense. My research focuses on analyzing and developing techniques that balance efficiency and fairness in NLP models. While numerous approaches have been developed to enhance the resource efficiency,
their impact on model fairness remains largely unclear. Prior to this, I completed my bachelor's degree in "Mathematics in Business and Economics", and subsequently pursued a master's degree in "Data Science" with a strong emphasis on NLP. Following the completion
of my master's degree, I transitioned into the industry, where I worked as a data scientist in the autonomous driving field.

Institution

UHH, BWL Faculty, Professorship of Data Science

Tags data science, natural language processing

images/02_events/T.Zesch%20In%20this%20talk.jpg#joomlaImage://local-images/02_events/T.Zesch In this talk.jpg?width=800&height=300

Monday, November 04th, 2024 | 17:15 p.m.

Why do I still need to grade all those exams? – Automatically scoring free-text student answers

Informatikum, Vogt-Kölln-Straße 30, Konrad-Zuse-Hörsaal (Raum B-201)

Prof. Dr. Torsten Zesch

Giving feedback on free-text answers (in the form of grades or helpful hints) is a core educational task. Despite a large body of NLP research on the topic, assisting teachers with this task remains challenging. In this talk, we outline the linguistic and external factors influencing the performance level that NLP methods may reach for a given question. However, even in settings where automatic performance rivals humans, there are various practical requirements often overlooked in research that hinder adoption in the classroom and beyond.

Torsten Zesch a full professor of Computational Linguistics at CATALPA (Center of Advanced Technology for Assisted Learning and Predictive Analytics), FernUniversität in Hagen, Germany. He holds a doctoral degree in computer science from Technische Universität Darmstadt and was the president of the German Society for Computational Linguistics and Language Technology (GSCL) from 2017 to 2023. His main research interests are in educational natural language processing, in particular the ways in which teaching and learning processes can be supported by language technology. For this purpose, he develops methods for the automatic analysis of textual and multimodal language data, with a focus on robust and explainable models.

Institutions

UHH, FB Informatik

Tags natural language processing, language technology, multimodal language data

People

Amy Isard

Research Associate

IDGS

amy.isard@uni-hamburg.de

Institutions

Tags digital humanities, natural language processing, sign language linguistics, corpus linguistics

Anne Lauscher

Professor of Data Science

anne.lauscher@uni-hamburg.de

Tags natural language processing, ethics and ai

images/03_personen/debayan%201.jpg#joomlaImage://local-images/03_personen/debayan 1.jpg?width=364&height=364

Debayan Banerjee

research associate

debayan.banerjee@uni-hamburg.de

Tags natural language processing, question answering, knowledge graphs

Dirk Hartung

Executive Director, CLTDS

dirk.hartung@law-school.de

Institutions

Center for Legal Technology and Data Science, BLS

Tags natural language processing, law, network science, complex systems, legal profession

Gregor Wiedemann

Senior Researcher Computational Social Science

Head of the Media Research Methods Lab (MRML)

g.wiedemann@leibniz-hbi.de

Institutions

Media Research Methods Lab at the Leibniz-Institute for Media Research | Hans-Bredow-Institut

Tags applied machine learning, interdisciplinary research, natural language processing, text mining, computational communication science

Marc Schulder

Research Associate

IDGS

DGS-Korpus project

marc.schulder@uni-hamburg.de

Institutions

Tags natural language processing, open science, sign language linguistics, corpus linguistics

Seid Muhie Yimam

Technical lead, HCDS

seid.muhie.yimam@uni-hamburg.de

Institutions

Tags digital transformation of research, digital humanities, natural language processing, sentiment analysis, hate speech and misinformation, social NLP, aI and applications

Stefan Bonn

Institute Director, Institute of Medical Systems Biology

Professor for Systems Biology

stefan.bonn@zmnh.uni-hamburg.de

Institutions

bAIome-Center for Biomedical AI, UKE

Tags natural language processing

Institutions

Language Technology Group, Dept. of Informatics, UHH

https://www.inf.uni-hamburg.de/en/inst/ab/lt/home.html

Research group working on all aspects of natural language processing with a focus on semantics, human-in-the-loop methods and adaptive systems

People

Tags natural language processing, adaptive machine learning

Media Research Methods Lab at the Leibniz-Institute for Media Research | Hans-Bredow-Institut

https://leibniz-hbi.de/en/research/research-programmes/media-research-methods-lab

The Media Research Methods Lab (MRML) at the HBI combines the methodological expertise of the HBI in an organisational unit that focuses on linking established social science methods with novel digital procedures.

People

Gregor Wiedemann

Tags applied machine learning, interdisciplinary research, natural language processing, computational social science, computational communication science

Referat für Digitale Forschungsdienste, State and University Library Hamburg Carl von Ossietzky

https://dh3.hypotheses.org/

Unit for the integration of digital humanities activities and services into the SUB portfolio

People

Tags digital humanities, natural language processing

Universität Hamburg
Adeline Scharfenberg
Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein.