AI Glossary: Letter "D"
Explore definitions and dynamic coverage analytics for the core concepts shaping artificial intelligence.
D
Data Augmentation
Data Augmentation is the practice of artificially increasing the size and diversity of a training dataset by applying transformations (like cropping, rotating, flipping, or paraphrasing) to existing data points.
Data Labeling
Data Labeling is the process of identifying raw data points (such as images, text, or audio files) and appending target category tags (labels) to them to create a labeled dataset for supervised learning.
Data Leakage
Data Leakage is a training error that occurs when information from outside the training dataset is used to train a model. This leads to overly optimistic performance scores during validation, but poor generalization on true unseen data.
Data Preprocessing
Data Preprocessing is the initial database and coding phase of cleaning, transforming, and formatting raw input datasets to prepare them for machine learning algorithms.
Dataset
A Dataset is a structured collection of data points, features, and target values used to train, validate, and evaluate machine learning models.
Dataset Curation
Dataset Curation is the process of collecting, cleaning, labeling, filtering, and organizing data to create a high-quality dataset for training or benchmarking machine learning models.
Decision Tree
A Decision Tree is a non-parametric supervised learning method used for both classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
Deep Learning
Deep Learning is a subset of machine learning based on artificial neural networks with multiple layers (hence "deep"). These layers extract high-level features progressively from raw input, enabling automated feature learning without manual engineering.
Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) is a subfield of machine learning that combines reinforcement learning principles (agents, actions, rewards) with deep neural networks to learn decision-making policies for high-dimensional state spaces.
Deepfake
A Deepfake is synthetic media (images, video, or audio) in which a person's face, voice, or body is digitally altered or replaced using deep generative models, typically autoencoders or Generative Adversarial Networks (GANs), to depict them saying or doing things they did not.
DeepSeek
DeepSeek is a prominent artificial intelligence research company specializing in developing high-performance open-source models, including reasoning, coder, and Mixture of Experts (MoE) architectures, which compete directly with leading proprietary systems.
Denoising
Denoising is the process of removing noise from a signal (like a digital image or audio track). In generative AI, denoising autoencoders and diffusion networks are trained to reconstruct clean inputs from intentionally corrupted variants.
Dense Model
A Dense Model is a neural network architecture where 100% of the model's parameters are activated and calculated for every single token processed, representing the traditional design of deep neural networks.
Diffusion Model
A Diffusion Model is a class of generative AI models that generate data by learning to reverse a process of gradual noise addition. By starting with random noise and iteratively removing it, the model can generate high-resolution images, video, or audio.
Dimensionality Reduction
Dimensionality Reduction is the process of reducing the number of input variables (features) in a dataset while retaining as much relevant information as possible. It is used to simplify models and visualize high-dimensional datasets.
Discriminator
A Discriminator is a neural network component within a Generative Adversarial Network (GAN) architecture. Its role is to evaluate inputs and classify them as either "real" (originating from the true training dataset) or "fake" (produced by the generator network).
Distillation
Knowledge Distillation is a compression technique where a smaller model (the student) is trained to replicate the behavior and output probabilities of a much larger model (the teacher). This transfers reasoning capabilities into smaller footprints.
Distributed Training
Distributed Training is the practice of partitioning machine learning workloads (data or parameters) across multiple compute processors (GPUs/TPUs) to accelerate training times for large neural networks.
Document Store
A Document Store is a database designed to store, retrieve, and manage document-oriented information, typically formatted as JSON, XML, or PDF structures. In RAG architectures, it holds the raw text associated with vector embeddings.
Dot Product Similarity
Dot Product Similarity is a metric that measures the alignment of two vectors in a high-dimensional space by multiplying corresponding elements and summing the products. Unlike Cosine Similarity, it is sensitive to vector magnitude.
Double Quantization
Double Quantization is a memory-saving process introduced in QLoRA that quantizes the quantization constants themselves, reducing the memory footprint of fine-tuning runs with zero accuracy loss.
DPO
Direct Preference Optimization (DPO) is a model alignment technique that bypasses the complex reward-model training phase of RLHF. DPO optimizes the policy directly on preference datasets (chosen vs. rejected responses) using a simple binary cross-entropy loss.
Dropout
Dropout is a regularization technique used in neural networks during training where a fraction of network nodes are randomly deactivated (dropped out) in each forward pass, preventing co-adaptation of features.