Research
Current research projects
Clustering methods divide data into different groups of similar elements, making the groups themselves dissimilar. In contrast, the goal of anticlustering is to divide data into groups that are as similar as possible. This approach is particularly important in fields such as experimental psychology or the analysis of medical data.
In collaboration with Dr. Martin Papenberg from the Institute of Experimental Psychology, we develop and implement new methods and apply them to specific research questions — for example, the planning of high-throughput sequencing experiments in the context of the chronic disease endometriosis.
Links
- HHU press release from 18.08.2025
Publications
- Papenberg, Martin, Martin Breuer, Max Diekhoff, Nguyen K. Tran, and Gunnar W. Klau. “Extending the Bicriterion Approach for Anticlustering: Exact and Hybrid Approaches.” Psychometrika, October 7, 2025, 1–20. https://doi.org/10.1017/psy.2025.10052
- Papenberg, Martin, Cheng Wang, Maïgane Diop, et al. “Anticlustering for Sample Allocation to Minimize Batch Effects.” Cell Reports Methods 5, no. 8 (2025): 101137. https://doi.org/10.1016/j.crmeth.2025.101137
- Papenberg, Martin, and Gunnar W. Klau. “Using Anticlustering to Partition Data Sets into Equivalent Parts.” Psychological Methods (US) 26, no. 2 (2021): 161–74. https://doi.org/10.1037/met0000301
Contact
- Nguyen Khoa Tran or Gunnar Klau
Page under construction. Content will be created in the near future.
Page under construction. Content will be created in the near future.
BiTE-Therapy for individualized treatment of cancer uses bi-specific antibodies, which target specific surface markers of cancer-cells. Live-Cell-Imaging-Systems enable the regular observation of cell-cultures over a longer period of time to prepare and explore this therapeutic approach. To evaluate the effectiveness of such therapies, living and dead cells have to be identified and counted on these image sequences. Especially for densely clustered cells, the commonly used machine learning models usually yield bad results.
We are developing an explainable, mathematical model for instance-segmentation of these Live-Cell-Images. Thanks to its independency of training-data, this model is able to yield proper results, when segmenting densely clustered cells. Working on the resulting image sections of individual cells, additional machine learning approaches can then be used to evaluate state and type of segmented cells.
Collaborations:
Analysis of time-series images from experiments on the dose-time-response of cancer drugs:
Dr. Nan Qin - Clinic for Hematology, Oncology, and Clinical Immunology, University Hospital Düsseldorf
Contact:
- Martin Jürgens
During a hospital stay, a large amount of clinical data is collected digitally. It is difficult to manually identify relevant correlations and patterns in this complex and high-dimensional data. Evaluating this data using clinical decision support models (CDSS) based on machine learning algorithms allows patient-specific risks to be better predicted and assessed.
Our goal is to consider the entire longitudinal history of a patient instead of just a snapshot by incorporating time series features into our models. In contrast to classic clinical risk scores, which are created once upon admission, our methods provide a frequently updated, adaptive prognosis that can support physicians in their treatment decisions.
Collaborations
- Study on dynamic mortality risks in myelodysplastic syndromes: Prof. Ulrich Germing, Prof. Sascha Dietrich – Clinic for Hematology, Oncology, and Clinical Immunology, University Hospital Düsseldorf
- Study on mortality risk after stem cell transplantation: Prof. Rainer Haas, Prof. Klaus Pfeffer, Prof. Guido Kobbe – University Hospital Düsseldorf
- Study on outcome after aneurysmal subarachnoid hemorrhage: Prof. Sajjad Muhammad, Vascular Neurosurgery Working Group, Department of Neurosurgery, University Hospital Düsseldorf
Publications
- Bobak J, Spohr P, Richter S, Streuer A, Schulz F, Strupp C, Gerhards C, Schmitt N, Luft T, Dietrich S, Germing U, and Klau GW. “Dynamic Mortality Risk Prediction in Myelodysplastic Syndromes Using Longitudinal Clinical Data” Accepted JCO Clinical Cancer, 2025. Preprint: https://doi.org/10.1101/2025.07.21.25331775
- Spohr P, Fröhlich RC, Scharf S, Rommerskirchen A, Bobak J, Schweier S, Jäger P, Kobbe G, Dietrich S, Dilthey AT, Henrich B, Pfeffer K, Haas R, and Klau GW. “Dynamic Prediction of Mortality Risk Following Allogeneic Hematopoietic Stem Cell Transplantation” Machine Learning Health, 2025. https://doi.org/10.1088/3049-477X/adf74e
Contact
- Jonathan Bobak or Philipp Spohr or Gunnar Klau
Page under construction. Content will be created in the near future.
Many biological processes consist of a temporal sequence of events. These include cell differentiation or tumor development, during which cells accumulate somatic mutations over time. Such processes often cannot be measured continuously. Instead, an attempt is made to reconstruct a process from a single measurement. For example, phylogenetic trees are constructed for this purpose.In cancer research, there is particular interest in studying the evolution of a tumor, which can be represented by such a phylogenetic tree. Although tumors develop individually and heterogeneously in each patient, researchers have identified mutations that occur in the same temporal order. Since these can have clinical implications, for example for the effectiveness of medications, it is of interest to identify such so-called recurrent trajectories.Naturally, a single measurement does not necessarily allow the entire past to be reconstructed accurately. As a result, reconstructed trees often contain uncertainties and errors, which make it difficult to identify recurrent trajectories.
Here, we model temporal sequences of events as incomplete partial orders and address common sources of uncertainty and error in reconstructed trees. Using an integer linear program, we identify the largest recurrent trajectories within a subset of the input data. We apply our tool, POsets for Temporal Trajectory Resolution (POTTR), to data on acute myeloid leukemia and non-small cell lung cancer from the TRACERx cohort. In doing so, we identify trajectories that can serve as candidates for studies investigating their clinical relevance.
Paper
- Sara C. Käufler, Henri Schmidt, Martin Jürgens, Gunnar W. Klau, Palash Sashittal, Benjamin J. Raphael. “POTTR: Identifying Recurrent Trajectories in Evolutionary and Developmental Processes using Posets.” bioRxiv 2026. https://doi.org/10.64898/2026.02.25.707960 Accepted to RECOMB 2026.
Contact
- Sara Käufler or Gunnar Klau
Vaccines are essential to prevent the infection and spread of viral diseases. This has become especially clear during the COVID-19 pandemic. However, further research is important to enable rapid responses to viral mutations and to develop better vaccines against current and emerging diseases. Epitope vaccines are an attractive alternative to conventional vaccines. They consist of small viral protein fragments with immunizing effects, so-called epitopes.
Here, we present a novel approach to epitope vaccines that exploits overlaps between epitope sequences. This allows us to include a large number of epitopes in our vaccine candidate, leading to a high fraction of immunized individuals within a target population. Using SARS-CoV-2 data, we demonstrate a theoretical efficacy of our vaccines in over 98% of the world population, with high numbers of immunizing epitopes per individual. However, laboratory experiments are required to validate the functionality of overlapping epitope vaccines.
Links
- News at HHU 20.12.2023
Paper
- Sara C. Schulte, Alexander T. Dilthey, Gunnar W. Klau. “HOGVAX: Exploiting Epitope Overlaps to Maximize Population Coverage in Vaccine Design with Application to SARS-CoV-2.” Cell Systems 14, 1-9 December 20, 2023. https://doi.org/10.1016/j.cels.2023.11.001
Contact
- Sara Käufler and Gunnar Klau
Completed research projects
The genetic information of complex organisms like humans, animals and plants is stored in long molecules, named chromosomes. Humans and most animals inherit genetic information from their two biological parents, resulting in two slightly differing copies of each chromosome. The process of separating these two copies and reconstructing the underlying sequences is called (haplotype) phasing. In our group we study the phasing problem on polyploid organisms like plants, which contain more than two copies per chromosome. While phasing two copies (diploid phasing) is well studied (also from our group [1]), existing methods cannot be generalized to the polyploid case in a canonical way. In this project, we developed new methods for both read-based and genetic phasing with respect to the properties of polyploid organisms [2,3,4].
Funding
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 395192176 and 391137747 – as well as under Germany’s Excellence Strategy – EXC 2048/1 – 390686111.
Cooperations
- Marschall lab, University Hospital Düsseldorf
- Richard Finkers, U Wageningen, now at GenNovation
- Usadel lab, Cluster of Excellence on Plant Sciences, HHU
Publications
- Patterson, Murray, Tobias Marschall, Nadia Pisanti, et al. “WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.” Journal of Computational Biology 22, no. 6 (2015): 498–509. https://doi.org/10.1089/cmb.2014.0157
- Schrinner, Sven D., Rebecca Serra Mari, Jana Ebler, et al. “Haplotype Threading: Accurate Polyploid Phasing from Long Reads.” Genome Biology 21, no. 1 (2020): 252. https://doi.org/10.1186/s13059-020-02158-1
- Schrinner, Sven, Rebecca Serra Mari, Richard Finkers, et al. “Genetic Polyploid Phasing from Low-Depth Progeny Samples.” iScience, Elsevier, 2022, 104461.
- Serra Mari, Rebecca, Sven Schrinner, Richard Finkers, et al. “Haplotype-Resolved Assembly of a Tetraploid Potato Genome Using Long Reads and Low-Depth Offspring Data.” Genome Biology 25, no. 1 (2024): 26. https://doi.org/10.1186/s13059-023-03160-z