Definition
Distillation
Knowledge Distillation is a compression technique where a smaller model (the student) is trained to replicate the behavior and output probabilities of a much larger model (the teacher). This transfers reasoning capabilities into smaller footprints.
Frequently Asked Questions
Why is a distilled model better than a model trained from scratch?▼
Because the student model learns from the rich probability distributions (soft labels) of the teacher, capturing nuances that raw text datasets lack.
Give an example of a distilled model.▼
DistilBERT, which is 40% smaller than BERT but retains 97% of its language understanding performance.
Distillation Media Coverage & Intelligence
arXiv AIJun 18, 2026
Skill-Guided Continuation Distillation for GUI Agents
Improving GUI agents typically relies on behavior cloning on expert trajectories. However, as the current policy deviates from the expert policy, it inevitably