{"query":"LLM Inference: A Working Mental Model","count":10,"results":[{"slug":"llm-inference-mental-model","title":"LLM Inference: A Working Mental Model","kind":"essay","summary":"A compact mental model for how LLM inference actually runs — two phases, a growing memory system, and a scheduler deciding who gets GPU time next — so that cost, latency, and hardware decisions stop being magic.","compact_summary":"Inference is a loop with two phases. Prefill is compute-bound, decode is memory-bandwidth-bound. KV cache removes redundant recomputation but grows with context and users. Continuous batching, PagedAttention, and composed parallelism are what made production serving economical. Agent workloads are decode-dominated and gain most from prefix caching and batching.","confidence":"high","updated_at":"2026-04-19T00:00:00.000Z","score":291,"match_fields":["title","summary","compact_summary","tags","key_claims","section_map","body"]},{"slug":"llm-operator-fundamentals","title":"LLM Operator Fundamentals: Temperature, Attention, Caching","kind":"essay","summary":"The three knobs that actually change what an LLM system ships — temperature, attention design, and prompt caching — explained at the operator level, not the research level.","compact_summary":"Temperature controls determinism and should be set per task, not left at the API default. Attention is a finite budget that structured prompts and focused retrieval spend well. Prompt caching has specific semantics — cache boundaries, breakpoints, and prefix stability — that decide whether an agent costs cents or dollars per session.","confidence":"high","updated_at":"2026-04-19T00:00:00.000Z","score":143,"match_fields":["title","summary","compact_summary","tags","key_claims","section_map","body"]},{"slug":"agent-seo-and-discovery","title":"Agent SEO and Discovery","kind":"essay","summary":"The next discovery problem is not only how humans find a page, but how agents discover, trust, and route to it.","compact_summary":"Agent SEO is about more than indexing: it is the practice of making pages discoverable, inspectable, trustworthy, and easy to route through for machine readers using discovery files, search endpoints, compact summaries, and explicit metadata.","confidence":"high","updated_at":"2026-04-10T00:00:00.000Z","score":98,"match_fields":["title","summary","compact_summary","tags","key_claims","section_map","body"]},{"slug":"the-agent-economy","title":"The Agent Economy: Wallets, Payments, and Bots as Purchasers","kind":"essay","summary":"Why the next economic layer of the web involves agents that can discover, evaluate, and pay for services — and what that means for how we build public surfaces.","compact_summary":"Agent commerce is not science fiction. Agentic wallets, 402 payment protocols, and universal commerce APIs already exist. The question is not whether bots will transact, but what the trust and pricing layer should look like when they do.","confidence":"medium","updated_at":"2026-04-11T00:00:00.000Z","score":88,"match_fields":["title","summary","compact_summary","tags","key_claims","section_map","body"]},{"slug":"context-lifecycle-for-ai-systems","title":"Context Lifecycle for AI Systems","kind":"essay","summary":"Why good AI systems should not treat context as one giant blob, and why summary, consolidation, and drill-down layers matter.","compact_summary":"Context should behave more like a lifecycle than a dump: short-term working state, compact summaries, longer-term memory, and periodic consolidation each serve different jobs and should not be collapsed into one context window.","confidence":"high","updated_at":"2026-04-10T00:00:00.000Z","score":82,"match_fields":["title","summary","compact_summary","tags","key_claims","section_map","body"]},{"slug":"trustworthy-co-thinker-vs-eager-executor","title":"Trustworthy Co-Thinker vs Eager Executor","kind":"essay","summary":"A product and safety stance for agents: useful systems should think clearly, expose uncertainty, and escalate action instead of racing toward execution.","compact_summary":"The safest default for many agent systems is to behave like a co-thinker rather than an eager executor: help frame decisions, expose uncertainty, and keep the human in authority where real risk exists.","confidence":"high","updated_at":"2026-04-10T00:00:00.000Z","score":81,"match_fields":["title","summary","compact_summary","tags","key_claims","section_map","body"]},{"slug":"about","title":"About civ.build","kind":"guide","summary":"The person behind this project, the thinking that shaped it, and what it is trying to become.","compact_summary":"civ.build is a public knowledge surface built by Hasan Kilickaya — a place to publish serious ideas in a form that works for both humans and autonomous agents.","confidence":"high","updated_at":"2026-04-13T00:00:00.000Z","score":77,"match_fields":["title","summary","compact_summary","tags","key_claims","section_map","body"]},{"slug":"index","title":"civ.build","kind":"manifesto","summary":"A public knowledge contract where serious pages are written once, rendered for humans, and exposed to agents through explicit compact and full retrieval layers.","compact_summary":"civ.build is not a generic AI site. It is a markdown-first publishing surface where pages expose summaries, trust signals, provenance, and queryable endpoints for both human and agent readers.","confidence":"high","updated_at":"2026-04-10T00:00:00.000Z","score":77,"match_fields":["summary","compact_summary","tags","key_claims","section_map","body"]},{"slug":"data-accumulation-as-an-asset","title":"Data Accumulation as an Asset","kind":"essay","summary":"Why accumulating data over time is valuable, why we still do not have information markets, and why the person or system that owns the accumulated graph holds the real leverage.","compact_summary":"Data accumulation is not just hoarding. It is a compounding asset: the more you capture, synthesize, and consolidate, the more valuable the graph becomes — for personal use, for agent consumption, and potentially as a product pattern.","confidence":"medium","updated_at":"2026-04-11T00:00:00.000Z","score":74,"match_fields":["title","summary","compact_summary","tags","key_claims","section_map","body"]},{"slug":"memory-consolidation-and-sleep-loops","title":"Memory Consolidation and Sleep Loops for AI","kind":"essay","summary":"Why AI memory systems should consolidate, compress, and strengthen knowledge over time instead of storing everything statically, and how periodic sleep loops make that practical.","compact_summary":"Static retrieval is not memory. Real memory consolidates: it reviews what changed, promotes what matters, compresses what does not need detail, and emits a record of what changed and why. AI systems should aspire to that lifecycle.","confidence":"medium","updated_at":"2026-04-11T00:00:00.000Z","score":70,"match_fields":["title","summary","compact_summary","tags","key_claims","section_map","body"]}]}