NAVIGATION

AI Glossary: Letter "D"

Explore definitions and dynamic coverage analytics for the core concepts shaping artificial intelligence.

D

Data Augmentation

Data Augmentation is the practice of artificially increasing the size and diversity of a training dataset by applying transformations (like cropping, rotating, flipping, or paraphrasing) to existing data points.

Model TrainingRead Term

Data Labeling

Data Labeling is the process of identifying raw data points (such as images, text, or audio files) and appending target category tags (labels) to them to create a labeled dataset for supervised learning.

Model TrainingRead Term

Data Leakage

Data Leakage is a training error that occurs when information from outside the training dataset is used to train a model. This leads to overly optimistic performance scores during validation, but poor generalization on true unseen data.

Model TrainingRead Term

Data Preprocessing

Data Preprocessing is the initial database and coding phase of cleaning, transforming, and formatting raw input datasets to prepare them for machine learning algorithms.

Model TrainingRead Term

Dataset

A Dataset is a structured collection of data points, features, and target values used to train, validate, and evaluate machine learning models.

Foundational AIRead Term

Dataset Curation

Dataset Curation is the process of collecting, cleaning, labeling, filtering, and organizing data to create a high-quality dataset for training or benchmarking machine learning models.

Model TrainingRead Term

Decision Tree

A Decision Tree is a non-parametric supervised learning method used for both classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

Foundational AIRead Term

Deep Learning

Deep Learning is a subset of machine learning based on artificial neural networks with multiple layers (hence "deep"). These layers extract high-level features progressively from raw input, enabling automated feature learning without manual engineering.

Foundational AIRead Term

Deep Reinforcement Learning

Deep Reinforcement Learning (DRL) is a subfield of machine learning that combines reinforcement learning principles (agents, actions, rewards) with deep neural networks to learn decision-making policies for high-dimensional state spaces.

Model TrainingRead Term

Deepfake

A Deepfake is synthetic media (images, video, or audio) in which a person's face, voice, or body is digitally altered or replaced using deep generative models, typically autoencoders or Generative Adversarial Networks (GANs), to depict them saying or doing things they did not.

Model LimitationsRead Term

DeepSeek

DeepSeek is a prominent artificial intelligence research company specializing in developing high-performance open-source models, including reasoning, coder, and Mixture of Experts (MoE) architectures, which compete directly with leading proprietary systems.

Foundational AIRead Term

Denoising

Denoising is the process of removing noise from a signal (like a digital image or audio track). In generative AI, denoising autoencoders and diffusion networks are trained to reconstruct clean inputs from intentionally corrupted variants.

Computer VisionRead Term

Dense Model

A Dense Model is a neural network architecture where 100% of the model's parameters are activated and calculated for every single token processed, representing the traditional design of deep neural networks.

Neural ArchitecturesRead Term

Diffusion Model

A Diffusion Model is a class of generative AI models that generate data by learning to reverse a process of gradual noise addition. By starting with random noise and iteratively removing it, the model can generate high-resolution images, video, or audio.

Generative AIRead Term

Dimensionality Reduction

Dimensionality Reduction is the process of reducing the number of input variables (features) in a dataset while retaining as much relevant information as possible. It is used to simplify models and visualize high-dimensional datasets.

Mathematical FoundationsRead Term

Discriminator

A Discriminator is a neural network component within a Generative Adversarial Network (GAN) architecture. Its role is to evaluate inputs and classify them as either "real" (originating from the true training dataset) or "fake" (produced by the generator network).

Generative AIRead Term

Distillation

Knowledge Distillation is a compression technique where a smaller model (the student) is trained to replicate the behavior and output probabilities of a much larger model (the teacher). This transfers reasoning capabilities into smaller footprints.

Model TrainingRead Term

Distributed Training

Distributed Training is the practice of partitioning machine learning workloads (data or parameters) across multiple compute processors (GPUs/TPUs) to accelerate training times for large neural networks.

Hardware & InfrastructureRead Term

Document Store

A Document Store is a database designed to store, retrieve, and manage document-oriented information, typically formatted as JSON, XML, or PDF structures. In RAG architectures, it holds the raw text associated with vector embeddings.

Data InfrastructureRead Term

Dot Product Similarity

Dot Product Similarity is a metric that measures the alignment of two vectors in a high-dimensional space by multiplying corresponding elements and summing the products. Unlike Cosine Similarity, it is sensitive to vector magnitude.

Mathematical FoundationsRead Term

Double Quantization

Double Quantization is a memory-saving process introduced in QLoRA that quantizes the quantization constants themselves, reducing the memory footprint of fine-tuning runs with zero accuracy loss.

Model OperationsRead Term

DPO

Direct Preference Optimization (DPO) is a model alignment technique that bypasses the complex reward-model training phase of RLHF. DPO optimizes the policy directly on preference datasets (chosen vs. rejected responses) using a simple binary cross-entropy loss.

Model TrainingRead Term

Dropout

Dropout is a regularization technique used in neural networks during training where a fraction of network nodes are randomly deactivated (dropped out) in each forward pass, preventing co-adaptation of features.

Model TrainingRead Term