Definition

Quantization

Quantization is the process of compressing neural network parameters by reducing the numerical precision of its weights (e.g. converting 16-bit floating points to 4-bit integers), lowering VRAM requirements and accelerating inference.

Frequently Asked Questions

Does quantization degrade model performance?▼

It can cause minor degradation in accuracy, but modern quantization algorithms (like GPTQ or AWQ) minimize this loss while reducing model size by up to 75%.

What are GGUF and EXL2?▼

Lightweight quantized file formats optimized to run LLMs locally on CPUs/Macs (GGUF) or GPUs (EXL2).

Quick Facts

CategoryModel Operations
Key ApplicationMobile AI execution, local model hosting (GGUF, AWQ), and consumer GPU model loading

Coverage Trend12 Weeks

12w agoToday

Related AI Terms

AI Model GPU LLM

Quantization Media Coverage & Intelligence

arXiv AIJun 6, 2026

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

Post-training quantization (PTQ) is critical for the efficient deployment of large language models (LLMs). Recen

Read Original Coverage