Practical Skills
API Basics
How to actually call an LLM from code, the mechanics every builder needs
What it is
LLM APIs expose model capabilities over HTTP. The standard pattern: send a POST request with your messages array (system + user turns), model selection, and parameters (temperature, max_tokens), receive a response containing the generated text and token usage.
Key parameters: temperature (0 = deterministic, 1+ = creative/random), max_tokens (caps response length), top_p (nucleus sampling). Streaming is available on all major APIs to return tokens as they're generated, improving perceived latency.
Cost is billed per token (input + output, often at different rates). At scale, token efficiency becomes a real engineering concern.