Jump to content Jump to search

Learning from Human-generated Data with On-the-fly Sample Reweighting

This project explores the efficacy of on-the-fly sample reweighting in training machine learning models. STORM - Self-Taught On-the-fly Rescaling via Meta loss - is a method, which learns to adjust the relative importance of training samples during the course of model training. It does so without needing clean seed data. While this approach has shown promise in natural language processing tasks with artificial or real noise, its scalability and effectiveness for real-world data, characterized by varied and potentially high noise levels, remain to be explored. This study addresses key uncertainties, such as the types of noise in human-generated data, the generalizability of STORM’s effectiveness, and strategies to mitigate confirmation bias, especially in under-represented sample classes. It will also investigate STORM’s performance in handling extremely noisy data and datasets without initial labels, raising important considerations about developing reliable classification models in practical applications.

Dr. Michael Heck

References:

Michael Heck, Christian Geishauser, Nurul Lubis, Carel van Niekerk, Shutong Feng, Hsien-Chin Lin, Benjamin Matthias Ruppik, Renato Vukovic, and Milica Gašić. March 2025. Learning from Noisy Labels via Self-Taught On-the-Fly Meta Loss Rescaling. In Proceedings of the AAAI Conference on Artificial Intelligence. Volume 39. Philadelphia, Pennsylvania, USA. AAAI Press, Washington, DC, USA. Association for the Advancement of Artificial Intelligence.