Core Architecture

Embeddings

How models represent meaning internally, dense vectors that capture semantic relationships

What it is

Embeddings are how transformers represent tokens internally as numbers. Each token ID is mapped to a dense vector of floating-point numbers (the embedding dimension, typically 768 to 12,288 values for modern models). These vectors are learned during training.

Initially, embeddings just encode identity, token 42 maps to a fixed vector. But after being processed through the transformer's layers and attention mechanism, these vectors become "enriched" with contextual meaning. The embedding for "bank" in "river bank" becomes different from "bank" in "investment bank" because attention has incorporated surrounding context.

Embeddings are also the foundation of semantic search and RAG systems, where text is converted to vectors and stored in vector databases for similarity lookup.

Why it matters

Embeddings are why LLMs understand meaning rather than just syntax. They're also the technical foundation for one of the most practically useful AI techniques (RAG (Retrieval-Augmented Generation)) where you embed your documents and retrieve relevant chunks based on vector similarity. If your team is building any AI product that needs to work with custom data, embeddings are almost certainly involved.

Related concepts

Transformers RAG (Retrieval-Augmented Generation)

Resources

But what is a GPT? Visual intro to Transformers (Embeddings section)

youtube.com· Beautiful visualization of how words become vectors in high-dimensional space and how directions in embedding space encode meaning. The "king − man + woman = queen" example made visual.

Word Embedding and Word2Vec, Clearly Explained

youtube.com· Explains word embeddings from the ground up, including skip-gram, CBOW, and negative sampling. Clear step-by-step style.

How might LLMs store facts (Deep Learning Chapter 7)

youtube.com· Goes deeper into how embedding representations evolve through transformer layers. Covers superposition and how networks store knowledge in high-dimensional space.

How Transformer LLMs Work (free short course)

deeplearning.ai· Free course from DeepLearning.AI that covers the evolution from Bag-of-Words to Word2Vec to transformer embeddings. Beautifully illustrated. Authors of "Hands-On Large Language Models."

NLP Illustrated, Part 2: Word Embeddings

towardsdatascience.com· Uses a clever movie analogy (rating movies on romance/mystery/action axes) to build intuition for embeddings before applying the concept to words. Very beginner-friendly.

Word Embeddings Explained

towardsdatascience.com· Traces the history of word representations from n-grams through Word2Vec to transformer-era contextual embeddings. Good big-picture article.

Deep Dive into LLMs like ChatGPT (Neural network I/O section)

youtube.com· Shows how tokens become embedding vectors and flow through the transformer. Part of the general-audience track, minimal math.

NextEnd of section