Project Page

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

Yunkai Dang*, Kaichen Huang*, Jiahao Huo*, Yibo Yan, Sirui Huang, Dongrui Liu, Mengxi Gao, Jie Zhang, Chen Qian, Kun Wang, Yong Liu, Jing Shao, Hui Xiong, Xuming Hu

* Equal contribution. Corresponding author: xuminghu@hkust-gz.edu.cn.

arXiv 2024

Structured Map Of MLLM Explainability

The survey organizes methods, benchmarks, and applications into a coherent landscape instead of a loose paper list.

Data, Model, And Inference Perspectives

It frames interpretability from the viewpoints of data, model internals, and training or inference behavior.

Evaluation Standards Still Lag Behind

The paper highlights that faithful, useful, and robust explanation evaluation remains a major open problem.

Overview

This survey addresses a problem that becomes more urgent as multimodal large language models become more capable: their decisions are harder to inspect, debug, and justify. MLLMs can already solve a wide range of tasks across image-text generation, visual question answering, retrieval, and multimodal reasoning, but the mechanisms behind those outputs often remain opaque. The survey is motivated by the view that performance alone is not enough; interpretability and explainability are necessary if these systems are to be trusted in high-stakes settings.

Rather than listing papers loosely, the survey builds a structured map of the field. It covers explainability methods, benchmark design, evaluation protocols, and open challenges across model architectures, training procedures, and inference strategies.

Survey Scope

The paper organizes the literature from three main perspectives: data, model, and training or inference. It also examines interpretability at multiple granularities, from token-level interactions to embedding-level representations and higher-level module behavior. In addition to analysis tools, the survey covers architecture design choices, alignment methods, hallucination-oriented explanation work, robustness benchmarks, and application-specific interpretability studies.

Explainable and interpretable MLLMs survey overview

The survey organizes MLLM explainability research across data, model, and training or inference dimensions, while also covering benchmarks and applications.

Explainability Challenge: as multimodal large language models become more capable, their decisions become harder to inspect, debug, and justify.

Three-Perspective Framework: the survey organizes the literature from the viewpoints of data, model, and training or inference.

Evaluation Gap: the paper emphasizes that the field still lacks strong standards for measuring whether an explanation is faithful, useful, and robust.

Survey Value: this work acts as infrastructure for the area by giving researchers a structured map of methods, benchmarks, and open problems.

Paper Resource: the full survey is available on arXiv.

This is not a paper list dressed up as a survey. Its main value is that it gives the field a common explanatory framework for understanding where interpretability tools work, where they fail, and where evaluation is still too weak.

Main Takeaways

  • The explainability landscape is broader than attention visualization alone. The survey covers attribution methods, representation analysis, architecture-level interventions, reward and alignment strategies, and hallucination-oriented diagnosis.
  • Benchmarking remains a major bottleneck. The paper emphasizes that explanation quality needs better evaluation protocols, especially when explanations are used for trust, debugging, or alignment rather than only for qualitative inspection.
  • Token-level and representation-level analysis are both important. The survey highlights that MLLM behavior cannot be understood from one level alone because multimodal reasoning mixes visual grounding, language priors, and cross-modal fusion.
  • The field still lacks widely accepted standards for measuring whether an explanation is faithful, useful, and robust under real deployment constraints.

Why This Survey Matters

This paper is valuable less as a single algorithm and more as infrastructure for the research area. It gives researchers a common vocabulary for discussing explainability in MLLMs and helps separate what is already mature from what is still poorly understood.

For practitioners, the survey is useful because it makes clear that transparency is not one technique but a stack of decisions: dataset design, representation learning, model architecture, alignment, decoding, and post-hoc analysis all interact. If the goal is more accountable multimodal AI, this survey is a strong starting point.

BibTeX Citation

BibTeX

@article{dang2024explainable,
  title={Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey},
  author={Dang, Yunkai and Huang, Kaichen and Huo, Jiahao and Yan, Yibo and Huang, Sirui and Liu, Dongrui and Gao, Mengxi and Zhang, Jie and Qian, Chen and Wang, Kun and Liu, Yong and Shao, Jing and Xiong, Hui and Hu, Xuming},
  journal={arXiv preprint arXiv:2412.02104},
  year={2024}
}