ICML 2026

IDLM: Inverse-distilled Diffusion Language Models

1Mohamed Bin Zayed University of AI 2Applied AI Institute 3AXXX 4EPITA *Equal contribution

IDLM distills pretrained Diffusion Language Models into few-step generators, reducing inference steps by 4x-64x while preserving generation quality.

Overview diagram comparing standard Diffusion Language Models with Inverse-distilled Diffusion Language Models.
Standard DLMs produce high-quality samples but require many inference steps. IDLM trains a distilled generator that keeps the teacher behavior while moving to fast inference.

Key Innovations

  1. Inverse distillation for tokens. IDLM adapts inverse distillation from continuous diffusion models to discrete language generation.
  2. A valid optimization target. The method addresses non-uniqueness in the inverse objective by establishing a uniqueness guarantee for the discrete setting.
  3. Stable training relaxations. Gradient-stable relaxations make backpropagation through discrete token spaces practical.
  4. Fast few-step sampling. Experiments across multiple DLM teachers report a 4x-64x reduction in inference steps while maintaining entropy and generative perplexity.

Abstract

Diffusion Language Models have become a promising route for text generation, but their iterative sampling process can make inference slow. IDLM extends inverse distillation to this discrete setting, using a pretrained DLM as the teacher and training a student generator for substantially fewer sampling steps.

The paper tackles the theoretical issue of non-unique inverse objectives and the practical difficulty of unstable discrete-space gradients. Its solution combines a uniqueness result with differentiable training relaxations, yielding a post-training framework that accelerates DLM inference while preserving core quality metrics.

Text Generation: DLM vs. IDLM

Animated comparison where a diffusion language model and IDLM perform the same generation task, but IDLM completes it in far fewer model calls.
The DLM side uses 64 sampling steps, while the IDLM side reaches completed text after only 4 distilled model calls.

IDLM keeps the discrete diffusion teacher's quality target, but compresses sampling into a few-step generator for substantially faster inference.

BibTeX

@article{li2026idlm,
  title={IDLM: Inverse-distilled Diffusion Language Models},
  author={Li, David and Gushchin, Nikita and Abulkhanov, Dmitry and Moulines, Eric and Oseledets, Ivan and Panov, Maxim and Korotin, Alexander},
  journal={arXiv preprint arXiv:2602.19066},
  year={2026}
}