Transformer
A Transformer is a deep learning neural network architecture introduced in 2017 by Google researchers, based entirely on self-attention mechanisms. It processes sequential inputs in parallel, capturing long-range dependencies and serving as the foundational engine for all modern LLMs.
Frequently Asked Questions
Why did Transformers replace recurrent architectures like LSTMs?▼
Recurrent models process tokens sequentially, which makes them slow and difficult to parallelize on GPUs. Transformers process entire sequences simultaneously, allowing them to train on massive web-scale datasets.
What is the role of positional encoding in Transformers?▼
Since Transformers process all sequence tokens simultaneously, they have no built-in sense of order. Positional encoding adds coordinate markers to the token vectors to convey the order of words.
Quick Facts
- CategoryNeural Architectures
- Key ApplicationLarge Language Models (GPT, Gemini, Claude), neural machine translation, and text-to-image foundation models.
Coverage Trend12 Weeks
Related AI Terms
Transformer Media Coverage & Intelligence
ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence
Convolutional networks, recurrent networks, and transformers each encode different inductive biases -- locality, sequential memory, and content-dependent pairwi
Architect Labs nabs $24M to speed up chip design projects with AI
Chip design startup Architect Labs Inc. launched today with $24 million in funding from a group of prominent investors. Kindred Ventures led the seed round. It was joined by Perplexity AI Inc. Chief Executive Officer Aravind Srinivas, Transformer co-inventor Lukasz Kaiser, former OpenAI Group PBC ex
OpenAI is bringing on some big guns in the lead-up to its IPO
OpenAI is bulking up before its IPO, landing Transformer co-inventor Noam Shazeer from Google DeepMind and former Trump AI policy official Dean Ball in the same