{"slug":"llm-operator-fundamentals","kind":"essay","title":"LLM Operator Fundamentals: Temperature, Attention, Caching","summary":"The three knobs that actually change what an LLM system ships — temperature, attention design, and prompt caching — explained at the operator level, not the research level.","compact_summary":"Temperature controls determinism and should be set per task, not left at the API default. Attention is a finite budget that structured prompts and focused retrieval spend well. Prompt caching has specific semantics — cache boundaries, breakpoints, and prefix stability — that decide whether an agent costs cents or dollars per session.","key_claims":["Temperature should be set deliberately per task. Any agent that acts — tool calls, code, structured output — should default to 0 and only be raised for a concrete reason.","Attention is a finite budget. Structured system prompts, focused retrieval, and deliberate instruction placement concentrate attention where it belongs; filler dilutes every real instruction.","Prompt caching is a design decision, not a default. Cache boundaries, multiple breakpoints, and prefix stability determine whether an agent costs cents or dollars per session.","Temperature, attention design, and cache layout are three views of the same problem — spending the model's finite determinism, focus, and budget deliberately."],"section_map":["Temperature — The Wrong Knob To Leave At Default","Attention — What Your Prompt Is Actually Competing For","Prompt Caching — The Lever Nobody Talks About Until The Bill Arrives","Why These Three Together"],"confidence":"high","intended_use":["Use this page to set deliberate defaults before building an agent, RAG pipeline, or tool-calling system.","Use it as a checklist when debugging unreliable tool-call arguments, lost-in-the-middle retrieval, or surprising inference bills."],"do_not_use_for":["Do not treat cache pricing specifics as authoritative over time — provider economics shift and should be verified against current SDK docs.","Do not skip empirical evaluation; these knobs are design starting points, not replacements for measurement on your own workload."],"updated_at":"2026-04-19T00:00:00.000Z","verified_at":"2026-04-19T00:00:00.000Z","version":"0.1.0","estimated_tokens":1643,"word_count":1217,"content_hash":"f7744ac7ec2f0cdce7558e8f3d4a2b922d922213772241f63ccc8ba0687f6b1f","change_summary":"Initial public version.","requires_human_judgment":false,"tags":["llm","agents","prompt-engineering","attention","prompt-caching","temperature","operator"],"_links":{"self":"/api/v1/content/llm-operator-fundamentals","compact":"/api/v1/content/llm-operator-fundamentals/compact","meta":"/api/v1/content/llm-operator-fundamentals/meta","raw":"/api/v1/content/llm-operator-fundamentals/raw","versions":"/api/v1/content/llm-operator-fundamentals/versions","related":["/api/v1/content/llm-inference-mental-model/compact","/api/v1/content/context-lifecycle-for-ai-systems/compact","/api/v1/content/trustworthy-co-thinker-vs-eager-executor/compact"],"canonical_human":"/p/llm-operator-fundamentals","capabilities":"/api/v1/capabilities"}}