AI Glossary: A to Z Technical Directory

Explore definitions and dynamic coverage analytics for the core concepts shaping artificial intelligence.

Jump:A B C D E F G H I J K L M N O P Q R S T U V W Z

A

Accuracy

Accuracy is a classification metric measuring the fraction of total predictions that the model got correct, calculated as the sum of correct predictions divided by all predictions.

Mathematical FoundationsRead Term

Activation Function

An Activation Function is a mathematical formula applied to the output of a neural network node to determine whether it should be activated (transmit signal) or not. It introduces non-linear properties to the network, allowing it to learn complex patterns instead of just linear transformations.

Mathematical FoundationsRead Term

Active Learning

Active Learning is a semi-supervised learning framework where a machine learning algorithm queries a human annotator to label only the most informative or uncertain data points, minimizing labeling cost.

Model TrainingRead Term

Adam Optimizer

Adam (Adaptive Moment Estimation) is an optimization algorithm used for training deep learning models. It combines the advantages of RMSProp and Momentum by calculating adaptive learning rates for each parameter based on estimates of the first and second moments of the gradients.

Mathematical FoundationsRead Term

Adversarial Attack

An Adversarial Attack is a technique that feeds a machine learning model intentionally designed inputs (adversarial examples) to cause it to make a mistake, fail, or hallucinate. In image models, this often involves introducing imperceptible pixel noise that completely alters the classification.

Model LimitationsRead Term

Agentic AI

Agentic AI refers to artificial intelligence systems designed to act autonomously, make decisions, plan workflows, and execute tasks without constant human intervention. Unlike traditional models that only respond to queries, agentic systems use an agentic loop to perceive environments, reason over goals, use tools, and iterate to achieve outcomes.

Agentic SystemsRead Term

AGI

Artificial General Intelligence (AGI) represents a theoretical form of AI that possesses the ability to understand, learn, and apply knowledge across any intellectual task at a level equal to or surpassing human capabilities. Unlike narrow AI, AGI is characterized by general reasoning and autonomous adaptability.

Theoretical AIRead Term

AI Agent

An AI Agent is an autonomous entity that perceives its environment through sensors (or inputs) and acts upon that environment using actuators (or tools) to achieve specific goals. An agent relies on a reasoning brain (typically an LLM) to plan and execute multi-step processes.

Agentic SystemsRead Term

AI Compute

AI Compute refers to the processing capacity (measured in floating-point operations or FLOPs) required to train and run inference on large-scale neural networks and machine learning models.

Hardware & InfrastructureRead Term

AI Copilot

An AI Copilot is an interactive assistant integrated directly into workspaces and applications, using Large Language Models to help users write code, draft emails, summarize documents, or execute tasks through natural language commands.

Agentic SystemsRead Term

AI Ethics

AI Ethics is a multidisciplinary field of study and governance that addresses the moral concerns, social impacts, and legal dilemmas associated with the development and deployment of artificial intelligence systems.

Alignment & SafetyRead Term

AI Governance

AI Governance refers to the systemic framework of policies, procedures, compliance standards, and organizational structures established to supervise, monitor, and regulate an organization's AI deployment.

Alignment & SafetyRead Term

AI Model

An AI Model is a mathematical algorithm trained on a dataset to perform specific tasks like classification, prediction, or text generation. It represents the saved states of a neural network (the weights and biases) after training, which can be deployed to run inference on new, unseen data.

Foundational AIRead Term

AI Orchestration

AI Orchestration is the process of coordinating and managing multiple AI models, autonomous agents, data retrieval pipelines, and database updates to execute complex, end-to-end enterprise workflows.

Agentic SystemsRead Term

AI Safety

AI Safety is a field of research focused on ensuring that artificial intelligence systems behave predictably, avoid causing harm, and remain aligned with human interests. It spans technical alignment, risk mitigation, and the study of existential risk from advanced systems.

Alignment & SafetyRead Term

AI Search Engine

An AI Search Engine is an information retrieval system that utilizes generative models to synthesize direct answers, summaries, and source citations to queries, rather than just returning a list of links (e.g., Perplexity, Google AI Overviews).

Information RetrievalRead Term

Algorithm

An Algorithm is a step-by-step procedure or set of mathematical rules designed to solve a specific problem or perform a calculation. In AI, algorithms determine how a model processes inputs and updates its parameters during learning.

Foundational AIRead Term

Algorithmic Bias

Algorithmic Bias (or AI Bias) occurs when a machine learning model generates systematic and repeatable errors that create unfair outcomes, typically due to prejudices or imbalances present in the training datasets.

Alignment & SafetyRead Term

Alignment

Alignment refers to the process of guiding an AI model's behaviors, responses, and values to match human intents, safety principles, and ethical standards. Unaligned models might generate toxic text, assist in harmful activities, or refuse user inputs.

Alignment & SafetyRead Term

Answer Engine Optimization

Answer Engine Optimization (AEO) is the process of optimizing web content to be retrieved and displayed as the primary, direct answer by search engine featured snippets and voice assistants (like Siri, Alexa, and Google Assistant).

Search Engine OptimizationRead Term

Anthropic

Anthropic is an AI safety and research company, creators of the Claude LLM family, founded by former OpenAI researchers to build steerable, reliable, and constitutional AI systems.

Foundational AIRead Term

Artificial Intelligence

Artificial Intelligence (AI) is a broad field of computer science dedicated to building systems capable of performing tasks that typically require human cognitive function, such as visual perception, speech recognition, decision-making, and translation.

Foundational AIRead Term

Attention Mechanism

An Attention Mechanism is a technique in neural networks that mimics cognitive attention, allowing the model to focus on specific parts of the input data when generating an output. It enables models to calculate the contextual relationships between distant elements in a sequence.

Neural ArchitecturesRead Term

Auto-GPT

Auto-GPT is an open-source autonomous agent application that showcases the capabilities of Large Language Models (specifically GPT-4) to run independently to achieve a user-defined goal by chaining thoughts and actions in a continuous loop.

Agentic SystemsRead Term

Autoencoder

An Autoencoder is a type of unsupervised neural network designed to learn efficient data codings (representations) by training the network to ignore signal noise. It consists of an encoder that compresses the input data, and a decoder that reconstructs the input from the compressed representation.

Neural ArchitecturesRead Term

Autoencoding

Autoencoding is an unsupervised learning approach where a neural network is trained to reconstruct its input values through a lower-dimensional bottleneck, learning efficient representations of the data.

Foundational AIRead Term

Autonomous Agent

An Autonomous Agent is an AI system designed to operate independently to achieve specific, high-level objectives. It constructs its own sub-tasks, plans sequences of actions, invokes external tools, inspects intermediate results, and corrects mistakes without user guidance.

Agentic SystemsRead Term

Autoregressive Model

An Autoregressive Model is an AI model that predicts future values in a sequence based on past values. In LLMs, autoregressive generation works by taking the prompt, predicting the next word, appending that word to the prompt, and repeating the process.

Foundational AIRead Term

B

Backpropagation

Backpropagation is the primary algorithm used to train neural networks. It works by calculating the gradient of the loss function with respect to the weights of the network, and then propagating that error backward through the layers using the chain rule to update parameters.

Mathematical FoundationsRead Term

Bag of Words

Bag of Words (BoW) is a simplified representation model used in natural language processing and information retrieval. In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and word order but keeping multiplicity.

Natural Language ProcessingRead Term

Batch Normalization

Batch Normalization is a technique that normalizes the inputs of each layer within a mini-batch during training, stabilizing the learning process and accelerating convergence.

Neural ArchitecturesRead Term

Batch Size

Batch Size is a model training hyperparameter defining the number of training examples processed in a single forward and backward pass before the model's internal parameter weights are updated.

Model TrainingRead Term

Bayesian Optimization

Bayesian Optimization is a sequential design strategy for global optimization of black-box functions. It is widely used in machine learning to tune hyperparameters, particularly when evaluating the target function is computationally expensive.

Mathematical FoundationsRead Term

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a language model developed by Google in 2018. Unlike autoregressive models, BERT is bidirectional, looking at the words before and after a target word to understand its context.

Natural Language ProcessingRead Term

Bi-Encoder

A Bi-Encoder is a neural network architecture that embeds the query and the candidate document separately into a shared vector space, allowing fast similarity comparisons using mathematical operations like cosine similarity or dot product.

Information RetrievalRead Term

Bias-Variance Tradeoff

The Bias-Variance Tradeoff is a core machine learning concept describing the conflict between a model's ability to minimize bias (errors from simple assumptions) and variance (errors from sensitivity to training data fluctuations). Balancing them is key to avoiding overfitting or underfitting.

Model TrainingRead Term

Bidirectional LSTM

A Bidirectional LSTM (BiLSTM) is a sequence processing architecture that consists of two LSTMs: one taking the input in a forward direction, and the other taking it in a backward direction. This allows the network to capture both past and future context at any point in the sequence.

Neural ArchitecturesRead Term

Black Box

A Black Box model is an AI or machine learning system whose internal workings, parameters, and decision-making logic are hidden or too complex for humans to interpret or understand (such as deep neural networks with billions of weights).

Model LimitationsRead Term

Blackwell

Blackwell is NVIDIA's high-performance GPU architecture designed specifically to accelerate trillion-parameter large language models, offering massive throughput improvements for AI training and inference workloads.

Hardware & InfrastructureRead Term

BM25

Okapi BM25 is a ranking function used by search engines to estimate the relevance of documents to a given search query. It is based on the probabilistic retrieval framework and improves upon TF-IDF by adding term frequency saturation and document length normalization.

Information RetrievalRead Term

C

Cascading Agent Failure

Cascading Agent Failure is a critical failure mode in multi-agent systems where an error, hallucination, or logical exception in a upstream worker agent propagates downstream, causing consecutive errors and a total system collapse.

Agentic SystemsRead Term

Causal Language Model

A Causal Language Model is an autoregressive model trained to predict the next token in a sequence given only the preceding tokens. It uses attention masking to prevent the model from looking at future tokens during training.

Natural Language ProcessingRead Term

Chain of Thought

Chain of Thought (CoT) prompting is a technique that instructs Large Language Models to write down their step-by-step reasoning process before outputting the final answer. This improves performance on complex reasoning, math, and logic tasks.

Prompt EngineeringRead Term

Chatbot

A Chatbot is a software application designed to simulate human-like conversations with users, either through text dialogues or voice interfaces, historically powered by rule-based patterns and now by Large Language Models.

Natural Language ProcessingRead Term

ChatGPT

ChatGPT is a conversational artificial intelligence chatbot developed by OpenAI, built on their family of GPT Large Language Models, which pioneered the generative AI consumer wave by providing fluid, human-like dialogue.

Generative AIRead Term

Chunking

Chunking is the process of breaking down a large, continuous document into smaller, manageable, and semantically cohesive text fragments (chunks) before indexing them in a vector database.

Information RetrievalRead Term

Claude

Claude is a family of state-of-the-art Large Language Models developed by Anthropic. Highly regarded for its reasoning, coding capabilities, and context window size, Claude models are trained using a methodology called Constitutional AI.

Foundational AIRead Term

CLIP

CLIP (Contrastive Language-Image Pre-training) is a neural network developed by OpenAI that learns visual concepts from natural language supervision. It is trained on millions of image-text pairs to match corresponding images and captions in a joint embedding space.

Multimodal AIRead Term

CNN

A Convolutional Neural Network (CNN) is a class of deep neural network most commonly applied to analyzing visual imagery. CNNs use mathematical convolution operations to extract hierarchical features from grid-like structures, making them ideal for image classification, object detection, and computer vision.

Computer VisionRead Term

Codegen

Codegen (Code Generation) refers to the capability of generative AI models to synthesize executable software code, scripts, or markups from natural language descriptions or existing code contexts.

Generative AIRead Term

Cognitive Architecture

Cognitive Architecture is the design blueprint for structuring an autonomous AI Agent. It defines how memory, planning steps, reflection mechanisms, and external tools interact with the core LLM brain to create a persistent agentic loop.

Agentic SystemsRead Term

Cold Start Problem

The Cold Start Problem is a challenge in recommender databases and search indexing where the system struggles to recommend items because it has no prior history, ratings, or interaction logs for a new user or a new item.

Data InfrastructureRead Term

Collaborative Filtering

Collaborative Filtering is a technique used by recommendation engines to filter or predict a user's interests by collecting preferences from many users. It assumes that if user A agrees with user B on an issue, user A is more likely to share B's opinion on a different issue.

Data InfrastructureRead Term

Computer Vision

Computer Vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos, models can accurately identify and classify objects, and react to what they "see."

Computer VisionRead Term

Concept Drift

Concept Drift is the phenomenon where the statistical properties of the target variable that a model is predicting change over time in an unforeseen way, causing the model to become inaccurate.

Model OperationsRead Term

Conditional Random Fields

Conditional Random Fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning, used for structured predicting. CRFs take context into account when predicting labels for sequence elements.

Natural Language ProcessingRead Term

Constitutional AI

Constitutional AI is an alignment training methodology developed by Anthropic to train helpful and harmless models without human-labeled feedback for safety. The model is given a written list of principles (a constitution) and recursively critiques its own outputs to align with those principles.

Alignment & SafetyRead Term

Context Engineering

Context Engineering is the practice of designing, structuring, and optimizing the prompt context window to maximize the accuracy and efficiency of Large Language Models. It focuses on how raw data, historical messages, and systemic rules are retrieved, formatted, and pruned before being sent to the model.

Prompt EngineeringRead Term

Context Window

The Context Window is the maximum volume of text (measured in tokens) that a Large Language Model can process and consider at any single moment. It contains the prompt instructions, user query, system settings, and memory history.

Neural ArchitecturesRead Term

Contrastive Learning

Contrastive Learning is a self-supervised training technique where a model learns to group similar inputs (positive pairs) close together in embedding space while pushing dissimilar inputs (negative pairs) far apart.

Model TrainingRead Term

Cosine Similarity

Cosine Similarity is a mathematical metric used to measure the similarity between two vectors in high-dimensional space by calculating the cosine of the angle between them. It is independent of vector magnitude, focusing purely on direction.

Mathematical FoundationsRead Term

Cost Function

A Cost Function is a mathematical formula that measures the performance of a machine learning model on the entire dataset. It represents the average of the loss function values computed across all training examples.

Mathematical FoundationsRead Term

CrewAI

CrewAI is an open-source framework designed for orchestrating role-playing autonomous AI agents. It enables developers to structure groups of agents that work together, share memories, delegate tasks, and execute collaborative workflows.

Agentic SystemsRead Term

Cross-Encoder

A Cross-Encoder is a neural network architecture used in information retrieval that processes the query and the candidate document together as a single input sequence, computing attention across both to produce a highly accurate relevance score.

Information RetrievalRead Term

Cross-Validation

Cross-Validation is a statistical resampling technique used to evaluate a machine learning model's generalization performance by partitioning the dataset into multiple training and validation folds and testing recursively.

Model TrainingRead Term

D

Data Augmentation

Data Augmentation is the practice of artificially increasing the size and diversity of a training dataset by applying transformations (like cropping, rotating, flipping, or paraphrasing) to existing data points.

Model TrainingRead Term

Data Labeling

Data Labeling is the process of identifying raw data points (such as images, text, or audio files) and appending target category tags (labels) to them to create a labeled dataset for supervised learning.

Model TrainingRead Term

Data Leakage

Data Leakage is a training error that occurs when information from outside the training dataset is used to train a model. This leads to overly optimistic performance scores during validation, but poor generalization on true unseen data.

Model TrainingRead Term

Data Preprocessing

Data Preprocessing is the initial database and coding phase of cleaning, transforming, and formatting raw input datasets to prepare them for machine learning algorithms.

Model TrainingRead Term

Dataset

A Dataset is a structured collection of data points, features, and target values used to train, validate, and evaluate machine learning models.

Foundational AIRead Term

Dataset Curation

Dataset Curation is the process of collecting, cleaning, labeling, filtering, and organizing data to create a high-quality dataset for training or benchmarking machine learning models.

Model TrainingRead Term

Decision Tree

A Decision Tree is a non-parametric supervised learning method used for both classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

Foundational AIRead Term

Deep Learning

Deep Learning is a subset of machine learning based on artificial neural networks with multiple layers (hence "deep"). These layers extract high-level features progressively from raw input, enabling automated feature learning without manual engineering.

Foundational AIRead Term

Deep Reinforcement Learning

Deep Reinforcement Learning (DRL) is a subfield of machine learning that combines reinforcement learning principles (agents, actions, rewards) with deep neural networks to learn decision-making policies for high-dimensional state spaces.

Model TrainingRead Term

Deepfake

A Deepfake is synthetic media (images, video, or audio) in which a person's face, voice, or body is digitally altered or replaced using deep generative models, typically autoencoders or Generative Adversarial Networks (GANs), to depict them saying or doing things they did not.

Model LimitationsRead Term

Denoising

Denoising is the process of removing noise from a signal (like a digital image or audio track). In generative AI, denoising autoencoders and diffusion networks are trained to reconstruct clean inputs from intentionally corrupted variants.

Computer VisionRead Term

Dense Model

A Dense Model is a neural network architecture where 100% of the model's parameters are activated and calculated for every single token processed, representing the traditional design of deep neural networks.

Neural ArchitecturesRead Term

Diffusion Model

A Diffusion Model is a class of generative AI models that generate data by learning to reverse a process of gradual noise addition. By starting with random noise and iteratively removing it, the model can generate high-resolution images, video, or audio.

Generative AIRead Term

Dimensionality Reduction

Dimensionality Reduction is the process of reducing the number of input variables (features) in a dataset while retaining as much relevant information as possible. It is used to simplify models and visualize high-dimensional datasets.

Mathematical FoundationsRead Term

Discriminator

A Discriminator is a neural network component within a Generative Adversarial Network (GAN) architecture. Its role is to evaluate inputs and classify them as either "real" (originating from the true training dataset) or "fake" (produced by the generator network).

Generative AIRead Term

Distillation

Knowledge Distillation is a compression technique where a smaller model (the student) is trained to replicate the behavior and output probabilities of a much larger model (the teacher). This transfers reasoning capabilities into smaller footprints.

Model TrainingRead Term

Distributed Training

Distributed Training is the practice of partitioning machine learning workloads (data or parameters) across multiple compute processors (GPUs/TPUs) to accelerate training times for large neural networks.

Hardware & InfrastructureRead Term

Document Store

A Document Store is a database designed to store, retrieve, and manage document-oriented information, typically formatted as JSON, XML, or PDF structures. In RAG architectures, it holds the raw text associated with vector embeddings.

Data InfrastructureRead Term

Dot Product Similarity

Dot Product Similarity is a metric that measures the alignment of two vectors in a high-dimensional space by multiplying corresponding elements and summing the products. Unlike Cosine Similarity, it is sensitive to vector magnitude.

Mathematical FoundationsRead Term

Double Quantization

Double Quantization is a memory-saving process introduced in QLoRA that quantizes the quantization constants themselves, reducing the memory footprint of fine-tuning runs with zero accuracy loss.

Model OperationsRead Term

DPO

Direct Preference Optimization (DPO) is a model alignment technique that bypasses the complex reward-model training phase of RLHF. DPO optimizes the policy directly on preference datasets (chosen vs. rejected responses) using a simple binary cross-entropy loss.

Model TrainingRead Term

Dropout

Dropout is a regularization technique used in neural networks during training where a fraction of network nodes are randomly deactivated (dropped out) in each forward pass, preventing co-adaptation of features.

Model TrainingRead Term

E

Early Stopping

Early Stopping is a regularization technique that halts a model's training process when its performance on a separate validation dataset stops improving, even if the training loss continues to decrease.

Model TrainingRead Term

Edge AI

Edge AI is the practice of running machine learning models and processing data directly on physical devices at the "edge" of the network (like smartphones, laptops, or IoT devices), rather than relying on centralized cloud servers.

Hardware & InfrastructureRead Term

Embedding

An Embedding is a representation of real-world data (words, sentences, images, user profiles) as high-dimensional vectors of real numbers. Embeddings place semantically similar concepts close to each other in vector space.

Foundational AIRead Term

Embedding Dimension

Embedding Dimension is the coordinate length of the vector used to represent data items in a latent space (e.g. OpenAI's text-embedding-3-small uses 1536 dimensions). It determines the detail capacity of the semantic space.

Mathematical FoundationsRead Term

Ensemble Methods

Ensemble Methods are machine learning techniques that combine predictions from multiple individual models to create a single, more robust prediction. Examples include bagging, boosting, and stacking.

Foundational AIRead Term

Epoch

An Epoch is a single complete pass of the entire training dataset through a machine learning model. Training typically consists of many epochs to allow the network to refine weights and biases based on multiple passes over the data.

Model TrainingRead Term

Euclidean Distance

Euclidean Distance is a mathematical metric measuring the straight-line distance between two coordinates in Euclidean space. In vector search, it is used to measure the similarity between two embedding vectors.

Mathematical FoundationsRead Term

Explainable AI

Explainable AI (XAI) is a suite of processes and methods that allow human users to comprehend and trust the results and outputs generated by machine learning algorithms. It aims to demystify the "black box" of deep neural networks.

Alignment & SafetyRead Term

Exploding Gradient Problem

The Exploding Gradient Problem is an error during backpropagation training where gradients accumulate, resulting in unstable, massive parameter updates that prevent model weights from converging.

Model TrainingRead Term

F

F1 Score

The F1 Score is a statistical metric used to evaluate a classification model's accuracy. It is calculated as the harmonic mean of precision (exactness) and recall (completeness), making it ideal for datasets with imbalanced classes.

Mathematical FoundationsRead Term

Feature

A Feature is an individual, measurable property or input variable used by a machine learning model to make predictions. In tabular datasets, features correspond to columns (e.g. square footage, age of home).

Foundational AIRead Term

Feature Engineering

Feature Engineering is the process of using domain knowledge to select, transform, combine, and manipulate raw variables into highly predictive input features for machine learning algorithms.

Model TrainingRead Term

Federated Learning

Federated Learning is a decentralized training technique that trains machine learning models across multiple remote edge devices holding local data samples, without exchanging the data itself.

Model TrainingRead Term

Few-Shot Learning

Few-Shot Learning is a machine learning paradigm where a model is trained or prompted to perform a task using only a small number of training examples. In LLMs, this is achieved by including a few demonstration inputs and outputs directly in the prompt context window.

Prompt EngineeringRead Term

Fine-Tuning

Fine-Tuning is the process of taking a pre-trained model and training it further on a smaller, specific dataset to adapt it for a particular task or domain. Fine-tuning alters the internal weights of the network, specializing its behavior and tone.

Model TrainingRead Term

FlashAttention

FlashAttention is a memory-efficient, exact self-attention algorithm that speeds up Transformer training and inference by tiling computations in GPU SRAM and avoiding HBM access.

Neural ArchitecturesRead Term

Foundation Model

A Foundation Model is a large-scale AI model trained on massive, broad datasets (typically through self-supervised learning) that serves as the baseline starting point for multiple downstream tasks. Examples include GPT-4, LLaMA, and stable diffusion models.

Foundational AIRead Term

Fully Connected Layer

A Fully Connected Layer (Dense Layer) is a layer in an artificial neural network where every neuron is connected to all neurons in the previous layer, mapping linear combinations of inputs to outputs.

Neural ArchitecturesRead Term

Function Calling

Function Calling is an LLM capability where the model outputs a structured JSON object containing argument parameters to invoke specific external functions or APIs, enabling LLMs to act as dynamic interfaces for databases and systems.

Agentic SystemsRead Term

G

GAN

A Generative Adversarial Network (GAN) is a generative AI architecture consisting of two neural networks: a Generator (which creates fake data) and a Discriminator (which evaluates if the data is real or fake). The networks train in competition, forcing the generator to produce high-fidelity data.

Generative AIRead Term

Gating Mechanism

A Gating Mechanism is a structural design in neural networks that controls the flow of information through internal pathways using sigmoid-activated scalar multipliers.

Neural ArchitecturesRead Term

GELU

GELU (Gaussian Error Linear Unit) is a smooth activation function that scales input values by the cumulative distribution function of the standard normal distribution, commonly used in BERT and modern Transformers.

Mathematical FoundationsRead Term

Gemini

Gemini is a family of highly capable, natively multimodal AI models developed by Google. Designed from the ground up to process and combine different modalities of information (including text, code, audio, image, and video) seamlessly.

Foundational AIRead Term

Generalization

Generalization is a machine learning model's ability to make accurate predictions on new, unseen test data that was not present in the dataset used to train the network.

Model TrainingRead Term

Generative AI

Generative AI refers to algorithms and models designed to generate new, original content, including text, images, music, code, or video. Popular architectures like Transformers, GANs, and Diffusion models serve as the engines powering generative AI platforms.

Foundational AIRead Term

Generative Engine Optimization

Generative Engine Optimization (GEO) is the modern marketing and search optimization practice of structuring website content so it is successfully retrieved, cited, and recommended by AI search engines and LLM answer systems.

Generative AIRead Term

Generative Pre-training

Generative Pre-training is the initial phase of training a Large Language Model on massive, unlabeled text datasets where the model learns token relationships by predicting the next word in sequence.

Model TrainingRead Term

GGUF

GGUF (GPT-Generated Unified Format) is a file format designed for storing models for inference with llama.cpp. It is optimized to support fast on-device loading and quantization.

Model OperationsRead Term

GPT

GPT (Generative Pre-trained Transformer) is a decoder-only autoregressive transformer architecture developed by OpenAI. It was pre-trained on massive text datasets to predict next words, pioneering the modern conversational AI era.

Neural ArchitecturesRead Term

GPT-4

GPT-4 is a state-of-the-art multimodal Large Language Model developed by OpenAI, trained on both text and visual inputs to perform complex reasoning, coding, and logical operations.

Foundational AIRead Term

GPT-ese

GPT-ese (or AI-speak) is a colloquial term for the specific stylistic, overly polite, repetitive, or cliché-ridden writing style characteristic of outputs generated by early Large Language Models without custom alignment.

Model LimitationsRead Term

GPU

A Graphics Processing Unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory. Because training neural networks involves massive matrix multiplication, the parallel processing power of GPUs is critical for modern AI workloads.

Hardware & InfrastructureRead Term

Gradient Descent

Gradient Descent is an optimization algorithm used to minimize a model's loss function during training. It iteratively calculates the slope (gradient) of the error surface and updates model parameters (weights) in the direction of the steepest descent.

Mathematical FoundationsRead Term

Graph Neural Network

A Graph Neural Network (GNN) is a class of artificial neural network designed to process data represented as graphs (consisting of nodes and edges), extracting features through message-passing neighborhoods.

Neural ArchitecturesRead Term

Graph RAG

Graph RAG (Graph Retrieval-Augmented Generation) is an advanced retrieval technique that couples vector similarity search with structured Knowledge Graphs. It extracts entities and relationships from documents, building a network to answer complex, global queries.

Agentic SystemsRead Term

Greedy Decoding

Greedy Decoding is a sequence generation method where the model always selects the single token with the highest predicted probability at each step during output text generation.

Model OperationsRead Term

Grounding

Grounding is the process of anchoring an AI model's generated outputs to verifiable real-world facts, external files, or structured databases. It keeps model predictions factual, grounded, and traceably accurate.

Information RetrievalRead Term

Grouped-Query Attention

Grouped-Query Attention (GQA) is an attention query layout grouping query heads to share a single Key and Value head, reducing the memory footprint of the KV cache.

Neural ArchitecturesRead Term

Guardrails

Guardrails refer to validation layers placed around AI models to intercept inputs (prompts) and outputs (completions). They ensure safety policies, structure schemas, and prevent toxic leakage or jailbreaks.

Alignment & SafetyRead Term

H

Hallucination

Hallucination is a phenomenon where a Large Language Model (LLM) generates outputs that are factually incorrect, nonsensical, or ungrounded in real-world data. It occurs because LLMs predict word probabilities rather than referencing a direct database of facts.

Model LimitationsRead Term

Hidden Layer

A Hidden Layer is a layer of neurons located between the input layer and the output layer in an artificial neural network, responsible for extracting and learning abstract features from input data.

Neural ArchitecturesRead Term

Human-in-the-Loop

Human-in-the-Loop (HITL) is a design pattern in autonomous systems and machine learning workflows that requires human intervention or approval at key checkpoints before executing critical, irreversible, or high-risk actions.

Alignment & SafetyRead Term

Hyperautomation

Hyperautomation is an enterprise operational strategy focused on identifying, vetting, and automating as many business and IT processes as possible using AI, Robotic Process Automation (RPA), and low-code orchestrations.

Agentic SystemsRead Term

Hyperparameter

A Hyperparameter is a configuration variable whose value is set before the machine learning training process begins. Unlike standard parameters (weights and biases) which are learned automatically during training, hyperparameters control the learning behavior itself.

Model TrainingRead Term

Hyperplane

A Hyperplane is a subspace whose dimension is one less than that of its ambient space. In machine learning, it acts as a decision boundary to separate different data classes.

Mathematical FoundationsRead Term

I

Image Segmentation

Image Segmentation is a computer vision process of partitioning a digital image into multiple segments (sets of pixels), assigning a label to every pixel to outline exact boundaries of objects.

Computer VisionRead Term

Imbalanced Datasets

An Imbalanced Dataset is a training dataset where one class (or category) is significantly overrepresented compared to other classes, causing models to favor the majority class.

Model TrainingRead Term

In-Context Learning

In-Context Learning (ICL) is the emergent ability of pre-trained Large Language Models to learn new tasks from examples provided in the prompt context without gradient updates.

Prompt EngineeringRead Term

Indirect Prompt Injection

Indirect Prompt Injection is a security exploit where an attacker embeds malicious instructions inside untrusted third-party data (like web pages, uploaded PDFs, or emails) that an AI agent is instructed to read. When the agent processes the document, the hidden prompt overrides the system instructions and hijacks the agent.

Model LimitationsRead Term

Inductive Bias

Inductive Bias refers to the assumptions a machine learning algorithm uses to predict outputs for unseen inputs. It prioritizes specific solutions based on the structural design of the model.

Mathematical FoundationsRead Term

Inference

Inference is the process of using a trained AI model to make predictions or generate text based on new inputs. During inference, data flows forward through the neural network to produce an output, without modifying the model's weights.

Model OperationsRead Term

Intelligent Agent

An Intelligent Agent is an autonomous entity that perceives its environment through inputs, makes rational decisions based on goals, and executes actions using tools to achieve outcomes.

Agentic SystemsRead Term

J

Jailbreaking

Jailbreaking is a subset of adversarial prompt attacks where users structure prompt commands to bypass the safety alignment, moral rules, and system filters of Large Language Models.

Model LimitationsRead Term

K

K-Means Clustering

K-Means Clustering is an unsupervised machine learning algorithm that partitions a dataset into K distinct, non-overlapping clusters by assigning each data point to its nearest centroid.

Foundational AIRead Term

K-Nearest Neighbors

K-Nearest Neighbors (KNN) is a simple, non-parametric supervised learning algorithm used for classification and regression, which predicts target labels by looking at the majority vote of K closest neighboring coordinates.

Foundational AIRead Term

Knowledge Graph

A Knowledge Graph is a structured database representing a network of real-world entities (nodes) and their semantic relations (edges), allowing systems to query logical context.

Data InfrastructureRead Term

KV Cache

A KV Cache (Key-Value Cache) is an inference-time optimization storing the computed Key and Value attention tensors of past tokens to prevent redundant recalculations in autoregressive decoding.

Model OperationsRead Term

L

Label

A Label is the target output or correct outcome variable associated with a training example in supervised learning (e.g. labeling a picture as a "dog" or marking an email as "spam").

Foundational AIRead Term

LangChain

LangChain is an open-source framework designed to simplify the creation of applications using Large Language Models, providing abstractions for chains, prompt templates, memory, and tools.

Agentic SystemsRead Term

Latent Space

Latent Space is a multi-dimensional space where raw, complex data (such as images or text) is compressed into mathematical vector representations. In latent space, items that share similar abstract concepts or semantic meanings are mapped closer together.

Mathematical FoundationsRead Term

Layer Normalization

Layer Normalization is a technique that normalizes the activations of a neural network layer across all features for each single training example, stabilizing gradient updates in sequential models.

Neural ArchitecturesRead Term

Learning Rate

Learning Rate is a fundamental tuning hyperparameter in gradient descent optimizers that determines the mathematical step size taken toward the global minimum of the loss function during model training.

Mathematical FoundationsRead Term

Learning Rate Decay

Learning Rate Decay is a training hyperparameter setting that gradually decreases the optimizer's learning rate over epochs, allowing the model to make large updates early and fine adjustments later.

Model TrainingRead Term

Linear Attention

Linear Attention is a class of attention mechanisms designed to approximate the standard self-attention operation in linear time complexity relative to sequence length, bypassing the quadratic memory scaling limits of standard Transformers.

Neural ArchitecturesRead Term

Linear Regression

Linear Regression is a foundational statistical method and supervised learning algorithm used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.

Foundational AIRead Term

LLaMA

LLaMA (Large Language Model Meta AI) is a family of state-of-the-art open-weights foundation models released by Meta. LLaMA catalyzed the open-source AI developer ecosystem by offering models that could run locally with high efficiency.

Foundational AIRead Term

LLM

A Large Language Model (LLM) is a type of artificial intelligence model trained on vast amounts of text data to understand, generate, and manipulate natural language. Built on the Transformer architecture, LLMs use billions of parameters to recognize semantic patterns and reasoning relationships.

Foundational AIRead Term

LLM Evaluation

LLM Evaluation (LLM Eval) is the process of measuring the accuracy, reasoning quality, safety compliance, and formatting correctness of Large Language Model outputs using benchmarks or judge models.

Model OperationsRead Term

Logistic Regression

Logistic Regression is a foundational classification algorithm used to predict the probability of a binary target variable by mapping linear inputs to a sigmoid probability curve.

Foundational AIRead Term

Loop Engineering

Loop Engineering is the practice of designing, optimizing, and securing autonomous agent execution loops. In agentic AI, this involves structuring the iteration cycle—such as prompt loops, self-correction runs, and human-in-the-loop triggers—to minimize infinite recursion and maximize successful task execution.

Agentic SystemsRead Term

LoRA

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning (PEFT) technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, reducing training VRAM requirements.

Model TrainingRead Term

Loss Function

A Loss Function is a mathematical algorithm that measures the discrepancy between a model's predicted output and the actual true target value during training. The goal of training is to minimize this loss, adjusting weights based on gradients computed from it.

Mathematical FoundationsRead Term

LSTM

LSTM (Long Short-Term Memory) is a specialized recurrent neural network (RNN) architecture. It introduced gating mechanisms (input, output, and forget gates) to manage memory state, solving the vanishing gradient problem for sequential data.

Neural ArchitecturesRead Term

LSTM Memory Cell

An LSTM Memory Cell is the core building block of a Long Short-Term Memory network, containing a cell state that acts as a conveyor belt to carry historical information across sequences.

Neural ArchitecturesRead Term

M

Machine Learning

Machine Learning is a branch of artificial intelligence focused on building systems that learn from data, identify patterns, and make decisions with minimal human intervention. It represents the broader field that includes deep learning and classical statistics.

Foundational AIRead Term

Machine Translation

Machine Translation (MT) is a subfield of computational linguistics focused on using artificial intelligence models to automatically translate text or speech from one human language to another.

Natural Language ProcessingRead Term

Masked Language Modeling

Masked Language Modeling (MLM) is a self-supervised training task where a model learns token context by predicting hidden (masked) words in a sentence using surrounding left and right text tokens.

Natural Language ProcessingRead Term

Max Pooling

Max Pooling is a sample-based discretization process in CNNs. It divides the input image into sub-regions and outputs the maximum value from each sub-region, reducing dimensional size.

Neural ArchitecturesRead Term

MCP Client

A Model Context Protocol Client (MCP Client) is an application (such as an IDE, chatbot, or developer platform) that implements the MCP protocol to discover, connect to, and invoke tools and data sources exposed by MCP servers.

Agentic SystemsRead Term

MCP Server

A Model Context Protocol Server (MCP Server) is a lightweight utility service that exposes databases, file systems, specific APIs, or local command runtimes to MCP clients using a standardized, secure JSON protocol.

Agentic SystemsRead Term

Mean Absolute Error

Mean Absolute Error (MAE) is a mathematical loss metric used in regression models that calculates the average absolute differences between predicted values and actual target values.

Mathematical FoundationsRead Term

Mixture of Depths

Mixture of Depths (MoD) is a compute optimization technique where models dynamically route and process only a fraction of tokens through specific layers, skipping computation for simpler tokens.

Neural ArchitecturesRead Term

Mixture of Experts

Mixture of Experts (MoE) is a neural network design that scales model parameters without increasing compute cost. Instead of activating the entire network for every token, MoE routes inputs to specialized sub-networks ("experts") using a gating router.

Neural ArchitecturesRead Term

MLOps

MLOps (Machine Learning Operations) is a set of practices, culture, and tools focused on automating and unifying the lifecycle of machine learning models, spanning data collection, training, testing, deployment, and monitoring.

Model OperationsRead Term

Model Collapse

Model Collapse is a degenerative process affecting generative AI models trained recursively on synthetic data generated by previous generations of AI models. Over iterations, the model loses diversity, starts repeating patterns, and eventually outputs garbage.

Model LimitationsRead Term

Model Context Protocol

Model Context Protocol (MCP) is an open-source standard created by Anthropic that enables AI applications and agents to connect securely to local or remote data sources, developer tools, and API services via a standardized protocol.

Agentic SystemsRead Term

Model Drift

Model Drift (or model decay) is the degradation of an AI model's predictive performance in production over time, caused by changes in the statistical properties of real-world input data relative to the training data.

Model OperationsRead Term

Model Merging

Model Merging is the process of combining two or more fine-tuned models into a single model without running any retraining or compute-heavy tuning. It averages or mathematically blends the weight metrics of the models.

Model OperationsRead Term

Model Registry

A Model Registry is a centralized repository store for managing the lifecycle of machine learning models. It stores model weights, parameter logs, version details, and deployment states.

Model OperationsRead Term

Multi-Agent Orchestration

Multi-Agent Orchestration is the protocol framework that defines how multiple specialized AI agents communicate, delegate sub-tasks, exchange context, and collaborate sequentially or hierarchically to achieve a collective goal.

Agentic SystemsRead Term

Multi-Agent System

A Multi-Agent System (MAS) is a computerized system composed of multiple interacting intelligent agents. These agents coordinate, communicate, and collaborate (or compete) with each other to solve complex problems that are beyond the individual capabilities of any single agent.

Agentic SystemsRead Term

Multi-Head Attention

Multi-Head Attention is an attention layout in Transformers that splits query, key, and value vectors into multiple subspaces, allowing the model to attend to information from different representation coordinates simultaneously.

Neural ArchitecturesRead Term

Multi-Query Attention

Multi-Query Attention (MQA) is an attention architecture where all query heads share a single Key and Value head to minimize KV cache storage.

Neural ArchitecturesRead Term

Multimodal AI

Multimodal AI refers to systems capable of processing, understanding, and generating multiple types of input and output data modalities simultaneously, such as text, images, audio, video, and code. This mirrors human-like perception across sensory channels.

Foundational AIRead Term

N

Named Entity Recognition

Named Entity Recognition (NER) is an NLP task that identifies and classifies key elements in text documents into predefined categories (such as names of people, organizations, locations, dates, or product codes).

Natural Language ProcessingRead Term

Neural Architecture Search

Neural Architecture Search (NAS) is an automated process for designing artificial neural networks. By defining a search space, search strategy, and performance metric, NAS algorithms automatically discover optimal layer configurations.

Model OperationsRead Term

Neural Network

An Artificial Neural Network (ANN) is a computing system inspired by the biological neural networks that constitute animal brains, structured as layers of interconnected nodes that process inputs to produce outputs.

Neural ArchitecturesRead Term

NLP

Natural Language Processing (NLP) is a subfield of computer science and AI concerned with the interactions between computers and human language. It involves training computers to process, analyze, and synthesize large amounts of natural language data.

Natural Language ProcessingRead Term

NormalFloat4

NormalFloat4 (NF4) is an information-theoretically optimal quantile quantization data type for normally distributed data, designed to compress neural network weights to 4-bit precision without losing accuracy.

Model OperationsRead Term

NPU

A Neural Processing Unit (NPU) is a specialized microprocessor circuit designed specifically to accelerate the execution of machine learning algorithms, commonly integrated into mobile SOCs and edge hardware.

Hardware & InfrastructureRead Term

NVIDIA

NVIDIA is a pioneer of GPU computing, dominating the hardware market for AI acceleration, training, and inference with its high-performance Hopper and Blackwell architectures.

Hardware & InfrastructureRead Term

O

Object Detection

Object Detection is a computer vision task that combines image classification and localization, identifying what objects are in an image and outputting bounding boxes around their coordinates.

Computer VisionRead Term

Observability

Observability in AI refers to the ability to measure, trace, and audit the internal states, reasoning paths, tool execution parameters, and model outputs of an AI system. It enables developers to debug complex reasoning steps and optimize agent behaviors.

Model OperationsRead Term

One-Hot Encoding

One-Hot Encoding is a data preprocessing technique that converts categorical variables (like "dog", "cat") into binary vector representations where only a single element is 1 (hot) and the rest are 0.

Mathematical FoundationsRead Term

One-Shot Learning

One-Shot Learning is a machine learning setup where a model is trained or prompted to perform a task or classify inputs after being shown only a single demonstration example.

Prompt EngineeringRead Term

OpenAI

OpenAI is an artificial intelligence research and deployment company behind ChatGPT, GPT-4, and Sora, dedicated to building safe and beneficial artificial general intelligence (AGI).

Foundational AIRead Term

Orchestration Layer

An Orchestration Layer is the control center of an agentic system that manages the execution loop, schedules task transitions, calls external tools, updates state databases, and routes inputs/outputs between the user, tools, and the LLM brain.

Agentic SystemsRead Term

Out-of-Distribution

Out-of-Distribution (OOD) data refers to inputs that originate from a different probability distribution than the dataset used to train the machine learning model, often causing models to make confident mistakes.

Model LimitationsRead Term

Overfitting

Overfitting is a common training error where a model learns the details and noise in the training dataset to the extent that it negatively impacts its performance on new, unseen test data. The model performs exceptionally well on training data but fails to generalize.

Model TrainingRead Term

P

Parameters

Parameters are the internal configuration variables of an AI model that are learned automatically from training data. In a neural network, parameters consist of weights (which determine connection strength) and biases (which offset activation curves).

Foundational AIRead Term

PEFT

Parameter-Efficient Fine-Tuning (PEFT) is a collection of training methods designed to fine-tune large foundation models by adapting only a tiny subset of additional parameters, while freezing the base model's weights.

Model TrainingRead Term

Perplexity

Perplexity is a core evaluation metric in natural language processing measuring how well a probability distribution or language model predicts a sample of text.

Mathematical FoundationsRead Term

Pre-training

Pre-training is the initial phase of training an AI model on a massive general-purpose dataset (unsupervised or self-supervised), teaching the model basic syntax, grammar, and features before fine-tuning.

Model TrainingRead Term

Precision

Precision (Positive Predictive Value) is a classification evaluation metric measuring the fraction of predicted positive examples that are actually correct, calculated as true positives divided by all predicted positives.

Mathematical FoundationsRead Term

Preference Alignment

Preference Alignment refers to the training process of tuning a Large Language Model's conversational behavior to match human preferences regarding helpfulness, safety guidelines, and formatting style.

Alignment & SafetyRead Term

Prompt

A Prompt is the textual, visual, or binary input submitted to a generative AI model to initiate and guide the generation of a specific response or action.

Prompt EngineeringRead Term

Prompt Engineering

Prompt Engineering is the practice of designing, structuring, and refining inputs (prompts) to get optimal, predictable outputs from generative AI models. It involves technique selection like chain-of-thought, few-shot prompting, and system routing.

Prompt EngineeringRead Term

Prompt Injection

Prompt Injection is a security vulnerability where a malicious user provides input that overrides the pre-configured system instructions or safety alignment filters of a Large Language Model, hijacking its control flow.

Model LimitationsRead Term

PyTorch

PyTorch is the dominant open-source machine learning framework developed by Meta AI research, widely used for building, training, and deploying deep learning models.

Foundational AIRead Term

Q

QLoRA

Quantized Low-Rank Adaptation (QLoRA) is an advanced parameter-efficient fine-tuning (PEFT) technique that runs LoRA over a base model quantized to 4-bit precision. It uses special formats like NormalFloat4 to maintain model accuracy while drastically reducing VRAM overhead.

Model TrainingRead Term

Quantization

Quantization is the process of compressing neural network parameters by reducing the numerical precision of its weights (e.g. converting 16-bit floating points to 4-bit integers), lowering VRAM requirements and accelerating inference.

Model OperationsRead Term

R

RAG

Retrieval-Augmented Generation (RAG) is a methodology that optimizes the output of a Large Language Model (LLM) by referencing an authoritative, external knowledge base or Vector Database before generating a response. RAG helps models access real-time information and drastically reduces hallucination.

Information RetrievalRead Term

Random Forest

Random Forest is an ensemble supervised learning algorithm composed of many individual Decision Trees that work together. It trains trees on random subsets of the data and features, averaging their predictions for output.

Foundational AIRead Term

Reasoning Model

A Reasoning Model (or o1-style model) is an artificial intelligence model trained to perform reinforcement learning and execute chain-of-thought steps internally before returning an answer. This allows the model to deliberate, correct mistakes, and evaluate strategies.

Foundational AIRead Term

Recall

Recall (Sensitivity or True Positive Rate) is a classification evaluation metric measuring the fraction of actual positive examples that the model correctly identified, calculated as true positives divided by all actual positives.

Mathematical FoundationsRead Term

Recurrent Neural Network

A Recurrent Neural Network (RNN) is a class of artificial neural network where connections between nodes form a directed graph along a temporal sequence, allowing it to exhibit temporal dynamic behavior and process variable-length inputs.

Neural ArchitecturesRead Term

Reinforcement Learning

Reinforcement Learning (RL) is a machine learning training paradigm where an agent learns to make decisions by performing actions in an environment to maximize cumulative rewards. The agent learns through trial-and-error feedback.

Model TrainingRead Term

Repository Intelligence

Repository Intelligence is a capability in AI developer tooling that allows models to index, analyze, and reason over an entire software codebase structure, rather than just reading active, isolated files.

Agentic SystemsRead Term

Reranking

Reranking is a secondary step in RAG (Retrieval-Augmented Generation) pipelines where a highly accurate model evaluates and re-orders the candidate documents fetched during initial vector search, ensuring the most relevant context is placed at the top.

Information RetrievalRead Term

Residual Connection

A Residual Connection (or skip connection) is an architectural feature in deep neural networks that passes the input of a layer directly to its output, bypassing one or more intermediate layers by adding them together.

Neural ArchitecturesRead Term

Responsible AI

Responsible AI is a business governance framework that guides how an organization designs, develops, and deploys artificial intelligence systems ethically, ensuring transparency, fairness, privacy, safety, and accountability.

Alignment & SafetyRead Term

Retrieval Precision

Retrieval Precision is an evaluation metric in RAG systems measuring the fraction of retrieved document chunks that are actually relevant to answering the user query. High retrieval precision prevents prompt clutter and distraction.

Information RetrievalRead Term

Reward Function

A Reward Function is a mathematical formula that defines the goal in reinforcement learning by assigning a numerical score to the states and actions of an agent based on their desirability.

Model TrainingRead Term

Reward Model

A Reward Model is a neural network trained to score responses generated by an LLM based on human preferences (e.g., helpfulness, safety, format correctness). It is used as the scoring engine in reinforcement learning loops like RLHF.

Model TrainingRead Term

RLAIF

Reinforcement Learning from AI Feedback (RLAIF) is a model alignment technique where human evaluators are replaced by an AI model (the judge) to generate preference labels for training, lowering alignment training costs.

Model TrainingRead Term

RLHF

Reinforcement Learning from Human Feedback (RLHF) is a training methodology used to align LLMs with human values and preferences. It uses human evaluations to train a reward model, which then guides the LLM to generate helpful, harmless, and honest outputs.

Model TrainingRead Term

Rotary Position Embedding

Rotary Position Embedding (RoPE) is an advanced position encoding method applying rotation matrices to token vectors, naturally capturing relative distance between tokens.

Mathematical FoundationsRead Term

S

Scaling Laws

Scaling Laws describe empirical mathematical power-law relationships predicting that an AI model's performance scales predictably as compute budget, training dataset size, and parameter count are scaled up.

Model TrainingRead Term

Search Grounding

Search Grounding is a verification technique where a generative AI model is connected to a live web search engine or structured document database. Before generating a response, the model queries the search engine to ground its response in real-time factual data.

Information RetrievalRead Term

Self-Attention

Self-Attention (or scaled dot-product attention) is an attention mechanism that relates different positions of a single sequence to compute a representation of the same sequence, allowing the model to calculate context dynamically.

Neural ArchitecturesRead Term

Self-Correction

Self-Correction is an agentic design pattern where an AI agent executes a task, validates the intermediate output (against unit tests, syntax linters, or criteria checklists), and recursively loops to edit and resolve mistakes when errors are found.

Agentic SystemsRead Term

Self-Supervised Learning

Self-Supervised Learning is a training paradigm where the model generates its own labels directly from the input data (e.g. masking words and predicting them), allowing training on massive unlabeled datasets without human labeling.

Model TrainingRead Term

Semantic Search

Semantic Search is an information retrieval technique that seeks to understand the searcher's intent and contextual meaning of terms, rather than just matching keywords. It leverages vector embeddings to find semantically relevant documents.

Information RetrievalRead Term

Sentiment Analysis

Sentiment Analysis is an NLP task that uses classification models to identify and extract subjective information (positive, negative, or neutral tones) from text datasets.

Natural Language ProcessingRead Term

SGD with Momentum

SGD with Momentum is an extension of Stochastic Gradient Descent that accelerates weight updates in the relevant direction by adding a fraction of the previous update vector to the current step.

Mathematical FoundationsRead Term

Sigmoid

Sigmoid is a mathematical activation function that maps any real-valued number into a value between 0 and 1, producing an S-shaped curve.

Mathematical FoundationsRead Term

Slop

Slop is a colloquial internet slang term for low-quality, hollow, or unverified AI-generated content (including text, images, or search summaries) posted online to attract clicks, often cluttering feeds without providing real human value.

Model LimitationsRead Term

Small Language Model

A Small Language Model (SLM) is a lightweight language model with fewer parameters (typically under 10 billion) trained on highly curated, high-quality datasets. SLMs are designed to run efficiently on local edge devices with low power requirements.

Foundational AIRead Term

Softmax

Softmax is an activation function that takes a vector of raw real numbers (logits) and normalizes them into a probability distribution where each value lies between 0 and 1, and all values sum to 1.

Mathematical FoundationsRead Term

Sora

Sora is an advanced text-to-video diffusion model developed by OpenAI, capable of generating high-fidelity, photorealistic video clips up to 60 seconds long from written text prompts.

Generative AIRead Term

Sovereign AI

Sovereign AI refers to a nation or organization's strategy to design, train, and deploy artificial intelligence models and infrastructure locally using their own data, computational hardware, and cultural values to maintain digital sovereignty and security.

Theoretical AIRead Term

SpaceX

SpaceX (Space Exploration Technologies Corp.) is an aerospace manufacturer and satellite communications company that integrates advanced autonomous control systems and AI telemetry software, and recently acquired the AI-coding platform Cursor (Anysphere) to accelerate software automation.

Foundational AIRead Term

Sparse Model

A Sparse Model is a neural network architecture that activates only a specific subset of its total parameters for any given token or input, utilizing routing mechanisms to achieve massive parameter scale without proportional compute costs.

Neural ArchitecturesRead Term

Speculative Decoding

Speculative Decoding is a latency optimization technique that accelerates LLM generation. A smaller, faster drafting model proposes multiple candidate tokens, which are then validated in parallel by the larger target model in a single forward pass.

Model TrainingRead Term

Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is an optimization algorithm that updates a model's weights using the gradient calculated from a single randomly chosen training sample (or a small batch) rather than the entire dataset.

Mathematical FoundationsRead Term

Structured Outputs

Structured Outputs is an LLM generation feature that guarantees model completions adhere strictly to a developer-specified schema (such as JSON Schema or Pydantic models), eliminating syntax parsing errors.

Model OperationsRead Term

Supervised Instruction Tuning

Supervised Instruction Tuning (SFT) is a training phase where a pre-trained base model is fine-tuned on a curated dataset of instruction-response pairs. This teaches the model to understand prompts, adopt an assistant persona, and output responses in a structured format.

Model TrainingRead Term

Supervised Learning

Supervised Learning is the most common machine learning category, where a model is trained on a labeled dataset. This means each training input is paired with its correct output label, allowing the model to learn mapping relationships.

Foundational AIRead Term

SwiGLU

SwiGLU is an activation function combining the Gated Linear Unit with Swish activation, used in feed-forward networks of modern Transformer blocks.

Neural ArchitecturesRead Term

Synthetic Data

Synthetic Data is information that is artificially generated by algorithms or computer simulations, rather than being obtained from real-world measurements, often used to train AI models when real data is scarce or sensitive.

Model TrainingRead Term

System Prompt

A System Prompt (or system instructions) is a set of core instructions provided to an AI model before the user conversation begins, defining the model's persona, boundaries, task rules, and formatting constraints.

Prompt EngineeringRead Term

T

Technological Singularity

The Technological Singularity is a hypothetical future point in time when technological growth becomes uncontrollable and irreversible, driven by self-improving artificial intelligence systems surpassing human intelligence, resulting in unfathomable changes to human civilization.

Theoretical AIRead Term

Temperature

Temperature is a parameter that controls the randomness and creativity of text generated by an autoregressive language model during inference. Higher values increase randomness, while lower values make outputs more deterministic.

Model OperationsRead Term

Tensor

A Tensor is a multi-dimensional mathematical array of numbers that serves as the fundamental data structure for representing inputs, weights, and activations in deep learning frameworks like TensorFlow and PyTorch.

Mathematical FoundationsRead Term

Token

A Token is the fundamental unit of text sequence analyzed or generated by a natural language model (roughly equal to 3/4 of a word). Words are encoded into token IDs before passing into neural layers.

Natural Language ProcessingRead Term

Tokenization

Tokenization is the process of breaking down a text string into smaller pieces called tokens (which can be characters, subwords, or full words). Tokenization converts text into numbers that a neural network can process.

Natural Language ProcessingRead Term

Tokenizer

A Tokenizer is a pre-processing component that breaks down raw text strings into discrete units called tokens (words, subwords, or characters) and maps them to numerical integer IDs that can be processed by a neural network.

Natural Language ProcessingRead Term

TPU

A Tensor Processing Unit (TPU) is an application-specific integrated circuit (ASIC) custom-developed by Google specifically to accelerate machine learning workloads, specialized in high-performance matrix math operations.

Hardware & InfrastructureRead Term

Training Data

Training Data is the initial dataset used to train a machine learning model, allowing it to learn features, weights, and mathematical relationships by processing inputs and computing adjustments.

Model TrainingRead Term

Transfer Learning

Transfer Learning is a machine learning technique where a model developed for one task is reused as the starting point for a model on a second, related task, significantly reducing the amount of labeled data and compute needed.

Model TrainingRead Term

Transformer

A Transformer is a deep learning neural network architecture introduced in 2017 by Google researchers, based entirely on self-attention mechanisms. It processes sequential inputs in parallel, capturing long-range dependencies and serving as the foundational engine for all modern LLMs.

Neural ArchitecturesRead Term

Turing Test

The Turing Test, originally proposed by Alan Turing in 1950, is a test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human through text conversation.

Theoretical AIRead Term

U

Underfitting

Underfitting is a training error that occurs when a machine learning model is too simple to capture the underlying structure and patterns in the training dataset, resulting in poor performance on both training and validation data.

Model LimitationsRead Term

Unsupervised Learning

Unsupervised Learning is a machine learning category where a model is trained on an unlabeled dataset. The algorithm attempts to discover hidden structures, groupings, or distributions within the input data without external guidance.

Foundational AIRead Term

V

Validation Data

Validation Data is a subset of the dataset held back during machine learning training, used to evaluate model progress, adjust hyperparameters, and prevent overfitting.

Model TrainingRead Term

Vanishing Gradient Problem

The Vanishing Gradient Problem is a training difficulty in deep neural networks where the gradients of the loss function shrink exponentially as they propagate backward to the early layers, preventing the model weights from updating and learning.

Model TrainingRead Term

Vector Database

A Vector Database is a specialized storage engine designed to store, index, and query high-dimensional vector embeddings efficiently. It enables fast semantic search and similarity matching using algorithms like HNSW or IVF.

Data InfrastructureRead Term

Vision Transformer

A Vision Transformer (ViT) is a neural network architecture that adapts the Transformer attention mechanism for computer vision tasks. By splitting images into grid patches and treating them like tokens in a sentence, ViT learns long-range visual relations.

Neural ArchitecturesRead Term

VLM

A Vision-Language Model (VLM) is a multimodal AI model trained on both images and text, enabling it to answer questions about visual content, describe images, or extract structured data from documents.

Neural ArchitecturesRead Term

W

Weights and Biases

Weights and Biases are the fundamental learnable parameters within a neural network. Weights determine the strength of connection between nodes, while biases offset the activation output, allowing the network to shift activation curves.

Mathematical FoundationsRead Term

Z

Zero-Shot Learning

Zero-Shot Learning is a machine learning task setup where a model is asked to perform a task or classify inputs without being shown any prior demonstration examples in the prompt context window, relying entirely on its general pre-trained knowledge.

Prompt EngineeringRead Term