Master Claude, Chapter 3: Understanding Entropy and Prompting Fundamentals — Why Your Prompts Fail and How to Fix Them
This is the third post in a chapter-by-chapter series on Master Claude Chat, Cowork and Code: From Prompting to Operational AI. The previous post was Chapter 2: The Three Pillars of Claude, where we covered how Chat, Cowork, and Code each serve a distinct role and the decision framework for choosing the right one.
Chapter 3 is the chapter that changed how I write every prompt. Before writing this book, I already knew that vague prompts produced vague outputs. What I did not understand was why — what was happening inside the model that made "write code" produce garbage and "write a Python function that takes a list of integers and returns the sum, raising ValueError if the list is empty" produce something useful. The answer is entropy, and once you see it, you cannot unsee it.
This chapter is the bridge between the theory of Chapter 1 and the practical techniques you will use every day. It takes the concepts of probability distributions and entropy and turns them into a concrete methodology for writing prompts that reliably produce what you want.
Ambiguity is the enemy
The chapter opens with a side-by-side comparison that I think every AI user should see. Two prompts, same task:
Prompt 1: "Write code"
Prompt 2: "Write a Python function that takes a list of numbers and returns the sum"
In the first prompt, the model faces very high entropy. "Code" could mean JavaScript, Python, Rust, SQL, HTML, or a hundred other languages. The function could do almost anything. The output is essentially unpredictable. In the second prompt, the entropy is dramatically lower. The model knows: this is Python, it is a function, it takes one argument, it returns one thing. The set of reasonable continuations has shrunk from "any code in any language" to a narrow band of Python function definitions.
This is the core insight: your goal as a prompt writer is to reduce the entropy of the probability distribution until it is peaked narrowly on the output you actually want. Every detail you include — language, format, constraints, examples — reduces entropy by ruling out implausible options.
XML-structured prompting: why it works
The section on XML tags is probably the most immediately useful part of the entire book. Instead of describing what you want in English prose, you wrap your instructions in XML tags: <instructions>, <context>, <examples>, <constraints>, <output_format>.
Why XML specifically? Because large language models are trained on vast amounts of text from the internet, and XML is common in structured data. The model has learned to treat content within XML tags as semantically meaningful. The tags create clear boundaries — the model knows when the instruction ends and the context begins, when the examples stop and the constraints start. This is not a trick. It is leveraging the model's training data to reduce ambiguity.
The chapter presents a standard prompt template that I now use for almost everything:
<instructions> [What do you want?] </instructions> <context> [Why are you asking? What is the use case?] </context> <examples> [Show concrete examples of input and output] </examples> <constraints> [What are the limitations or requirements?] </constraints> <output_format> [How should the response be formatted?] </output_format>
Not every prompt needs every section. But when you are stuck getting the results you want, adding missing sections almost always helps. Wrong format? Add <output_format>. Missing behavior? Add <examples>. Wrong assumptions? Add <context>.
Chain-of-thought: breaking hard problems into easy ones
Many difficult problems become tractable if you break them into smaller steps. This is chain-of-thought prompting, and the chapter explains both the technique and — more importantly — the entropy-based reason it works.
When you ask a model to solve a hard problem in one step, the probability distribution over possible next tokens is very broad. Many different approaches are plausible. But when you ask the model to "first, identify what information is relevant, then analyze that information, then derive a conclusion," you are constraining each step. Each step is easier than the full problem because each step is more specific. The model is good at small, localized reasoning. It is worse at large, global reasoning.
The chapter walks through a concrete example — a classic train-meeting-time problem — showing how the single-step approach produces errors while the step-by-step version gets it right. The difference is not magic. It is entropy reduction at each decision point.
Claude also offers extended thinking mode, where the model spends significant computational resources reasoning before responding. The chapter is clear about the trade-off: use it when the problem is genuinely complex and the cost of a mistake is high — architectural decisions, security analysis, research synthesis. Do not use it as a default. It is computationally expensive and adds latency. For routine tasks, it is wasted resources.
Multishot prompting: show, do not tell
The most practical entropy-reduction technique in the chapter is multishot prompting: showing multiple examples of what you want. Each example constrains the model's behavior more tightly than any amount of prose description.
The book is specific about how many examples you need. One-shot establishes a pattern. Two-shot usually establishes a clear pattern. Three-shot is rarely necessary unless the pattern is truly complex. More examples do not always help — if you have ten examples showing the same pattern, the tenth adds almost no new information.
What matters more than quantity is variation. Your examples should vary in the ways that matter to your task. If you are asking the model to extract information from customer emails, show examples of different writing styles (formal versus casual), different types of information (complaints versus inquiries), and different edge cases (missing information, unclear intent). This teaches the model the generalization: extract the relevant information regardless of style or format.
I will not reproduce the full example sequences from the book — those are the pages you will want to reference again and again — but I will say that once you start providing two well-chosen examples instead of writing three paragraphs of instructions, you will wonder why you ever did it any other way.
Debugging prompts when they fail
The chapter closes with a debugging framework that I found myself taping next to my monitor. When outputs do not match what you want, the issue is almost always entropy: the probability distribution is too broad. The fix is systematic:
Wrong format? → Add an <output_format> section.
Missing pieces? → Add examples showing the complete output.
Inconsistent results? → Add constraints specifying consistency requirements.
Too verbose? → Add a constraint: "Keep your response under 100 words."
Wrong tone? → Provide an example of the right tone.
Misunderstands the task? → Add context explaining why the task matters.
The chapter also covers context window trade-offs. Modern Claude has 100,000+ tokens of context, which is enormous. But longer prompts have costs: latency, token expense, and signal dilution. The rule of thumb is clean: include everything that is relevant, exclude everything that is not. If you are refactoring a specific function, include the function and its dependencies. You probably do not need the entire codebase.
What Chapter 3 sets up
By the end of this chapter, you will have a working methodology for writing prompts that reliably produce what you want. You will understand why XML-structured prompting reduces errors, how chain-of-thought and extended thinking relate to entropy at each decision point, why two well-chosen examples outperform three paragraphs of instructions, and how to systematically debug prompts that are not working.
These techniques apply across all three Claude interfaces. In the chapters that follow — starting with Part II on mastering Claude Chat — we will see how to apply them in each specific context, combined with persistent Projects, Artifacts, and operational execution.
Chapter 3 is the last chapter in Part I. It closes the foundational section. Everything from Chapter 4 onward is applied practice built on what you learned here.
Next in this series: Chapter 4 — Context Persistence with Claude Projects. We tackle the AI amnesia problem and show how Projects create persistent workspaces that remember your codebase, conventions, and constraints across every conversation.
📖 Get the complete book
All twenty chapters, the full prompt template, entropy-reduction frameworks, hands-on workflows for Claude Chat, Cowork, and Code, plus the CLI reference, CLAUDE.md templates, MCP examples, and security checklist.
Sho Shimoda
I share and organize what I’ve learned and experienced.カテゴリー
タグ
検索ログ
Development & Technical Consulting
Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.
Contact Us