NAVIGATION
Definition

Distillation

Knowledge Distillation is a compression technique where a smaller model (the student) is trained to replicate the behavior and output probabilities of a much larger model (the teacher). This transfers reasoning capabilities into smaller footprints.

Frequently Asked Questions

Why is a distilled model better than a model trained from scratch?

Because the student model learns from the rich probability distributions (soft labels) of the teacher, capturing nuances that raw text datasets lack.

Give an example of a distilled model.

DistilBERT, which is 40% smaller than BERT but retains 97% of its language understanding performance.

Quick Facts

  • CategoryModel Training
  • Key ApplicationEdge model creation, mobile-device AI deployment, and inference cost reduction

Coverage Trend12 Weeks

12w agoToday

Related AI Terms

Distillation Media Coverage & Intelligence

arXiv AIJun 18, 2026

Skill-Guided Continuation Distillation for GUI Agents

Improving GUI agents typically relies on behavior cloning on expert trajectories. However, as the current policy deviates from the expert policy, it inevitably