AI Glossary: Letter "S"
Explore definitions and dynamic coverage analytics for the core concepts shaping artificial intelligence.
S
Scaling Laws
Scaling Laws describe empirical mathematical power-law relationships predicting that an AI model's performance scales predictably as compute budget, training dataset size, and parameter count are scaled up.
Search Grounding
Search Grounding is a verification technique where a generative AI model is connected to a live web search engine or structured document database. Before generating a response, the model queries the search engine to ground its response in real-time factual data.
Self-Attention
Self-Attention (or scaled dot-product attention) is an attention mechanism that relates different positions of a single sequence to compute a representation of the same sequence, allowing the model to calculate context dynamically.
Self-Consistency Prompting
Self-Consistency Prompting is a reasoning strategy where a model generates multiple independent thinking paths for a prompt, and the system selects the most common final answer using majority voting.
Self-Correction
Self-Correction is an agentic design pattern where an AI agent executes a task, validates the intermediate output (against unit tests, syntax linters, or criteria checklists), and recursively loops to edit and resolve mistakes when errors are found.
Self-Supervised Learning
Self-Supervised Learning is a training paradigm where the model generates its own labels directly from the input data (e.g. masking words and predicting them), allowing training on massive unlabeled datasets without human labeling.
Semantic Chunking
Semantic Chunking is the process of dividing a long document into smaller, meaningful passages based on semantic changes rather than fixed character counts. This preserves sentence context and improves embedding accuracy for RAG pipelines.
Semantic Search
Semantic Search is an information retrieval technique that seeks to understand the searcher's intent and contextual meaning of terms, rather than just matching keywords. It leverages vector embeddings to find semantically relevant documents.
Sentiment Analysis
Sentiment Analysis is an NLP task that uses classification models to identify and extract subjective information (positive, negative, or neutral tones) from text datasets.
SGD with Momentum
SGD with Momentum is an extension of Stochastic Gradient Descent that accelerates weight updates in the relevant direction by adding a fraction of the previous update vector to the current step.
Sigmoid
Sigmoid is a mathematical activation function that maps any real-valued number into a value between 0 and 1, producing an S-shaped curve.
Slop
Slop is a colloquial internet slang term for low-quality, hollow, or unverified AI-generated content (including text, images, or search summaries) posted online to attract clicks, often cluttering feeds without providing real human value.
Small Language Model
A Small Language Model (SLM) is a lightweight language model with fewer parameters (typically under 10 billion) trained on highly curated, high-quality datasets. SLMs are designed to run efficiently on local edge devices with low power requirements.
Softmax
Softmax is an activation function that takes a vector of raw real numbers (logits) and normalizes them into a probability distribution where each value lies between 0 and 1, and all values sum to 1.
Sora
Sora is an advanced text-to-video diffusion model developed by OpenAI, capable of generating high-fidelity, photorealistic video clips up to 60 seconds long from written text prompts.
Sovereign AI
Sovereign AI refers to a nation or organization's strategy to design, train, and deploy artificial intelligence models and infrastructure locally using their own data, computational hardware, and cultural values to maintain digital sovereignty and security.
SpaceX
SpaceX (Space Exploration Technologies Corp.) is an aerospace manufacturer and satellite communications company that integrates advanced autonomous control systems and AI telemetry software, and recently acquired the AI-coding platform Cursor (Anysphere) to accelerate software automation.
Sparse Model
A Sparse Model is a neural network architecture that activates only a specific subset of its total parameters for any given token or input, utilizing routing mechanisms to achieve massive parameter scale without proportional compute costs.
Speculative Decoding
Speculative Decoding is a latency optimization technique that accelerates LLM generation. A smaller, faster drafting model proposes multiple candidate tokens, which are then validated in parallel by the larger target model in a single forward pass.
State Space Model
A State Space Model (SSM) is a mathematical framework used in deep learning to model sequence data through an implicit hidden state that transitions over time. Unlike Transformers that scale quadratically with context length, modern SSMs scale linearly, making them highly efficient for long sequences.
Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an optimization algorithm that updates a model's weights using the gradient calculated from a single randomly chosen training sample (or a small batch) rather than the entire dataset.
Structured Outputs
Structured Outputs is an LLM generation feature that guarantees model completions adhere strictly to a developer-specified schema (such as JSON Schema or Pydantic models), eliminating syntax parsing errors.
Supervised Instruction Tuning
Supervised Instruction Tuning (SFT) is a training phase where a pre-trained base model is fine-tuned on a curated dataset of instruction-response pairs. This teaches the model to understand prompts, adopt an assistant persona, and output responses in a structured format.
Supervised Learning
Supervised Learning is the most common machine learning category, where a model is trained on a labeled dataset. This means each training input is paired with its correct output label, allowing the model to learn mapping relationships.
SwiGLU
SwiGLU is an activation function combining the Gated Linear Unit with Swish activation, used in feed-forward networks of modern Transformer blocks.
Synthetic Data
Synthetic Data is information that is artificially generated by algorithms or computer simulations, rather than being obtained from real-world measurements, often used to train AI models when real data is scarce or sensitive.
Synthetic In-Context Learning
Synthetic In-Context Learning is a training paradigm where models are pre-trained or aligned using synthetic context examples generated by other models. This helps train models to perform complex tasks within their context window without altering weights.
System Prompt
A System Prompt (or system instructions) is a set of core instructions provided to an AI model before the user conversation begins, defining the model's persona, boundaries, task rules, and formatting constraints.