Jump to content Jump to search

Detecting Hallucinations in LLMs through the Topology of Attention Maps

This project aims to apply Topological Data Analysis (TDA) to detect hallucinations in large language models (LLMs). The work involves three stages: identifying and preparing a suitable dataset, reproducing results from the foundational study (Kushnareva et al., EMNLP 2021) that used topology of attention maps for sequence classification, and applying these methods to identify hallucinations. Students will use language models like BERT and the GPT-family and explore alternatives for feature extraction, contributing to a topological approach for evaluating generative model behavior.

Dr Benjamin Ruppik

References:

Kushnareva, L., Cherniavskii, D., Mikhailov, V., Artemova, E., Barannikov, S., Bernstein, A., … & Burnaev, E. (2021, November). Artificial Text Detection via Examining the Topology of Attention Maps. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 635-649).