📊 Projekt

KI-FOR 5363 DeSBi (Fusing Deep Learning and Statistics towards Understanding Structured Biomedical Data)

Humboldt-Universität zu Berlin

KI-FOR 5363 DeSBi (Fusing Deep Learning and Statistics towards Understanding Structured Biomedical Data)

Institution: Humboldt-Universität zu Berlin Category: Project
Website: https://desbi.de/

Short Description

The service provides a method for statistical testing of conditional independence for structured data such as images and tabular data, particularly in multimodal biomedical datasets. Target users are researchers in biomedicine who wish to perform inference-based analyses on complex data. The main benefit lies in providing validated, powerful tests with control of Type-I error and high statistical power. Universities benefit from improved methodological foundations for analyzing high-dimensional biomedical data.

General Description

-


Thematic Classification

Subject Areas

Computer Science
Medicine
Natural Sciences
Statistics
Biology
Machine Learning
Artificial Intelligence
Biomedical Informatics
Genomics
Neuroinformatics
Image Processing
Mathematics
Systems Biology
Computational Biology
Pharmacology
Epidemiology

Research Fields

  • Deep Learning
  • Statistics
  • Medical Imaging
  • Genomics
  • Causal Inference
  • Time Series Analysis
  • Relevance Analysis of Neural Networks (Explainable AI)
  • Structured Biomedical Data Analysis
  • Image Processing (Image Segmentation, Classification)
  • Genetic Association Studies (GWAS)
  • Uncertainty Quantification in Machine Learning
  • Transfer Learning
  • Disentanglement of Features
  • Concept-Based Explanations
  • Conditional Independence Tests
  • Biomedical Applications of AI
  • Computational Biology
  • RNA and Protein Sequence Analysis

Specializations

  • Development of conditional independence tests (CITs) for structured data such as images
  • Utilization of deep learning for data embedding (data embedding) for statistical tests
  • Application of statistical tests as a key instrument for multimodal datasets in the biomedical domain
  • Ensuring control of the Type-I error when dealing with large amounts of the null hypothesis of conditional independence
  • Increasing statistical power through transfer learning and optimally learned embeddings
  • Development of efficient algorithms and user-friendly software for application in large biomedical datasets such as the UK Biobank
  • Integration into other projects (P2, P4, P7) for visual explanation and analysis of multimodal data
  • Provision of tools for experimental design based on scientific questions using CITs

Keywords

  • P1: Deep conditional independence tests - Conditional independence testing - Multimodal datasets - Deep learning embeddings - Statistical inference - Imaging genetics - Nonparametric tests - Transfer learning - Type I error control - UK Biobank applications

Funding

Funding Provider: -
Funding Program: KI-FOR 5363
Funding Reference: KI-FOR 5363
Funding Period: 2023 - 2027
Project Volume: -


Team & Partners

Project Leadership

  • Prof. Dr. Sonja Greven (Humboldt-Universität zu Berlin)
  • Prof. Dr. Christoph Lippert (University of Potsdam / Hasso-Plattner Institute)

Involved Persons

  • Marco Simnacher (PhD Candidate)
  • Hani Park (PhD Candidate)
  • Xiangnan Xu (Postdoc)
  • Clara Hoffmann (PhD Student)
  • Dilyara Bareeva (PhD student)
  • Jim Berend (PhD student)
  • Lorenz Hufe (PhD student)
  • Sahar Iravani (Postdoc)
  • Masoumeh Javanbakhat (Postdoc)
  • Georg Keilbar (Postdoc)
  • Piotr Komorowski (PhD student)
  • Wei-Cheng Lai (PhD candidate)
  • Gabriel Nobis (PhD Candidate)
  • Roshan Rane (PhD Candidate)
  • Moritz Seiler (PhD Candidate)
  • Manuel Pfeuffer (PhD Cadidate)
  • Paulo Yanez Sarmiento (PhD Candidate)
  • Hadya Yassin (PhD Candidate)
  • Claudia Winklmayr (PhD Cadidate)
  • Maximilian Dreyer (PhD student)
  • Eshant English (PhD student)
  • Maarten Jung (PhD student)
  • Marta Lemanczyk (PhD student)
  • Alexander Rakowski (PhD student)
  • Sepideh Saran (PhD student)
  • Juliana Schneider (PhD student)
  • Ekkehard Schnoor (Postdoc)

Affiliated Institutions

-

External Partners

Humboldt-Universität zu Berlin, University of Potsdam, Hasso Plattner Institute (HPI), Max Delbrück Center for Molecular Medicine (MDC), Karlsruher Institut für Technologie (KIT), Charité – Universitätsmedizin Berlin, Fraunhofer Heinrich Hertz Institute (Fraunhofer HHI), Technische Universität Berlin


Project Contents

Goals

  • Development of conditional independence tests (CITs) for structured data such as images through the use of deep learning for data embedding
  • Ensuring Type-I error control in statistical tests for multimodal datasets in the biomedical domain
  • Improvement of statistical power through transfer learning, optimally learned embeddings, and tailored CITs for these embeddings
  • Provision of efficient algorithms and user-friendly software for application to large biomedical datasets such as the UK Biobank
  • Application of the tests in further projects (P2, P4, P7) for visual explanation and analysis of multimodal data

Work Packages

  • P1: Deep conditional independence tests with an application to imaging genetics
  • P2: Visual explanations for statistical tests
  • P3: Explainable AI for microscopy image analysis
  • P4: Deep learning for functional genomics
  • P5: Sparse and robust explanations for structured data
  • P6: Uncertainty quantification in biomedical deep learning
  • P7: Causal inference with multimodal data

Methods

  • Deep nonparametric conditional independence tests (DNCITs)
  • Embedding maps for feature representation extraction
  • Layer-wise relevance propagation (LRP) with pruning for sparsity
  • Pruned layer-wise relevance propagation for sparse explanations
  • Transfer learning for optimal embeddings
  • Nonparametric conditional independence tests (CITs)
  • Adversarially learned penalty for feature subspace independence
  • Metadata-guided feature disentanglement (MFD)
  • Procedurally generated dataset (Arctique) for uncertainty quantification
  • Online visualization tool (DeepRepViz) for latent representation inspection
  • Con-score (concept encoding score) for quantifying confounder influence
  • Virtual inspection layers for time series data interpretation
  • Reactive model correction via conditional bias suppression (R-ClArC)
  • Gradient penalization in latent space for bias unlearning
  • Pattern-based Concept Activation Vectors (CAVs) to overcome directional divergence
  • DualView for post-hoc data attribution using surrogate modeling
  • Regression in quotient metric spaces (e.g., square-root-velocity framework)
  • Splines for modeling smooth conditional mean curves
  • Concept-based explanations using prototypes (Understanding the (Extra-)Ordinary)
  • Reveal to Revise (R2R) framework for iterative bias correction
  • Model guidance via explanations to turn classifiers into segmentation models
  • PURE method for turning polysemantic neurons into pure features via circuit identification

Expected Outcomes

  • Development of conditional independence tests (CITs) for structured data such as images, leveraging deep learning for data embedding
  • Ensuring Type-I error control under large quantities of the null hypothesis of conditional independence
  • Increasing statistical power through transfer learning, optimally learned embeddings, and powerful CITs specifically adapted to these embeddings
  • Provision of efficient algorithms and user-friendly software for application in large-scale biomedical datasets such as the UK Biobank
  • Application of the tests in projects P2, P4, and P7, particularly as input for visual explanation tests
  • Provision of sample size and power calculations for CITs to enable researchers to plan experiments based on scientific questions

Contact

Contact Person: Eliza Mandieva, Project Coordinator
Email: eliza.mandieva@hu-berlin.de
Project Website: https://desbi.de/


Recorded: 2026-01-14
Source: https://desbi.de/

Visit Website