Sign in
Core Architecture

Embeddings

How models represent meaning internally, dense vectors that capture semantic relationships

What it is

Embeddings are how transformers represent tokens internally as numbers. Each token ID is mapped to a dense vector of floating-point numbers (the embedding dimension, typically 768 to 12,288 values for modern models). These vectors are learned during training.

Initially, embeddings just encode identity, token 42 maps to a fixed vector. But after being processed through the transformer's layers and attention mechanism, these vectors become "enriched" with contextual meaning. The embedding for "bank" in "river bank" becomes different from "bank" in "investment bank" because attention has incorporated surrounding context.

Embeddings are also the foundation of semantic search and RAG systems, where text is converted to vectors and stored in vector databases for similarity lookup.

Why it matters

Embeddings are why LLMs understand meaning rather than just syntax. They're also the technical foundation for one of the most practically useful AI techniques (RAG (Retrieval-Augmented Generation)) where you embed your documents and retrieve relevant chunks based on vector similarity. If your team is building any AI product that needs to work with custom data, embeddings are almost certainly involved.

Related concepts

Resources

But what is a GPT? Visual intro to Transformers (Embeddings section)
youtube.com· Beautiful visualization of how words become vectors in high-dimensional space and how directions in embedding space encode meaning. The "king − man + woman = queen" example made visual.
10 min
Word Embedding and Word2Vec, Clearly Explained
youtube.com· Explains word embeddings from the ground up, including skip-gram, CBOW, and negative sampling. Clear step-by-step style.
15 min
How might LLMs store facts (Deep Learning Chapter 7)
youtube.com· Goes deeper into how embedding representations evolve through transformer layers. Covers superposition and how networks store knowledge in high-dimensional space.
23 min
How Transformer LLMs Work (free short course)
deeplearning.ai· Free course from DeepLearning.AI that covers the evolution from Bag-of-Words to Word2Vec to transformer embeddings. Beautifully illustrated. Authors of "Hands-On Large Language Models."
60 min
NLP Illustrated, Part 2: Word Embeddings
towardsdatascience.com· Uses a clever movie analogy (rating movies on romance/mystery/action axes) to build intuition for embeddings before applying the concept to words. Very beginner-friendly.
8 min
Word Embeddings Explained
towardsdatascience.com· Traces the history of word representations from n-grams through Word2Vec to transformer-era contextual embeddings. Good big-picture article.
12 min
Deep Dive into LLMs like ChatGPT (Neural network I/O section)
youtube.com· Shows how tokens become embedding vectors and flow through the transformer. Part of the general-audience track, minimal math.
6 min
PreviousTokens
NextEnd of section