Parameters
The learned weights that define a model, and why size matters
What it is
Parameters (also called weights) are the numerical values inside a neural network that are learned during training. For a transformer, these include the attention weight matrices, feed-forward layer weights, and the embedding table. The total count of these values is what people mean by "model size."
A model with 7 billion parameters has 7 billion floating-point numbers that collectively encode everything it learned. Larger parameter counts generally mean more capacity to store knowledge and learn complex patterns, but also more compute required for both training and inference.
Model sizes range from tiny (1B parameters, runs on a phone) to massive (estimated ~1.8 trillion for GPT-4's mixture-of-experts architecture).