CLIP
CLIP (Contrastive Language-Image Pre-training) is a neural network developed by OpenAI that learns visual concepts from natural language supervision. It is trained on millions of image-text pairs to match corresponding images and captions in a joint embedding space.
Frequently Asked Questions
How does CLIP achieve zero-shot classification?▼
By embedding the target class names as text (e.g. "a photo of a [class]") and identifying which class embedding has the highest cosine similarity to the input image embedding.
Why is CLIP important for diffusion models?▼
Because it aligns text prompt meanings with visual structures, allowing models like Stable Diffusion to guide image generation based on user prompts.
Quick Facts
- CategoryMultimodal AI
- Key ApplicationZero-shot image classification, text-to-image search, and generative model image guidance.
Coverage Trend12 Weeks
Related AI Terms
CLIP Media Coverage & Intelligence
Microsoft discovers new lightweight backdoor that steals cryptocurrency
Crypto Clipper spreads over USB and communicates over Tor.
Game-clip AI startup General Intuition in talks to raise $300M at $2B valuation
General Intuition PBC is in talks to raise about $300 million at a valuation of just over $2 billion, TechCrunch reported today, citing people familiar with the talks. The New York-based startup uses video game footage to train artificial intelligence agents to navigate physical space. The price tag