Master Claude, Chapter 3: Understanding Entropy and Prompting Fundamentals — Why Your Prompts Fail and How to Fix Them

This is the third post in a chapter-by-chapter series on Master Claude Chat, Cowork and Code: From Prompting to Operational AI. The previous post was Chapter 2: The Three Pillars of Claude, where we covered how Chat, Cowork, and Code each serve a distinct role and the decision framework for choosing the right one.

Chapter 3 is the chapter that changed how I write every prompt. Before writing this book, I already knew that vague prompts produced vague outputs. What I did not understand was why — what was happening inside the model that made "write code" produce garbage and "write a Python function that takes a list of integers and returns the sum, raising ValueError if the list is empty" produce something useful. The answer is entropy, and once you see it, you cannot unsee it.

This chapter is the bridge between the theory of Chapter 1 and the practical techniques you will use every day. It takes the concepts of probability distributions and entropy and turns them into a concrete methodology for writing prompts that reliably produce what you want.

Ambiguity is the enemy

The chapter opens with a side-by-side comparison that I think every AI user should see. Two prompts, same task:

Prompt 1: "Write code"

Prompt 2: "Write a Python function that takes a list of numbers and returns the sum"

In the first prompt, the model faces very high entropy. "Code" could mean JavaScript, Python, Rust, SQL, HTML, or a hundred other languages. The function could do almost anything. The output is essentially unpredictable. In the second prompt, the entropy is dramatically lower. The model knows: this is Python, it is a function, it takes one argument, it returns one thing. The set of reasonable continuations has shrunk from "any code in any language" to a narrow band of Python function definitions.

This is the core insight: your goal as a prompt writer is to reduce the entropy of the probability distribution until it is peaked narrowly on the output you actually want. Every detail you include — language, format, constraints, examples — reduces entropy by ruling out implausible options.

💡 Key idea: Ambiguity in your request produces ambiguity in the output. Every constraint you add — format, language, edge cases, examples — collapses the probability distribution toward the output you actually want.

XML-structured prompting: why it works

The section on XML tags is probably the most immediately useful part of the entire book. Instead of describing what you want in English prose, you wrap your instructions in XML tags: <instructions>, <context>, <examples>, <constraints>, <output_format>.

Why XML specifically? Because large language models are trained on vast amounts of text from the internet, and XML is common in structured data. The model has learned to treat content within XML tags as semantically meaningful. The tags create clear boundaries — the model knows when the instruction ends and the context begins, when the examples stop and the constraints start. This is not a trick. It is leveraging the model's training data to reduce ambiguity.

The chapter presents a standard prompt template that I now use for almost everything:

<instructions>
[What do you want?]
</instructions>

<context>
[Why are you asking? What is the use case?]
</context>

<examples>
[Show concrete examples of input and output]
</examples>

<constraints>
[What are the limitations or requirements?]
</constraints>

<output_format>
[How should the response be formatted?]
</output_format>

Not every prompt needs every section. But when you are stuck getting the results you want, adding missing sections almost always helps. Wrong format? Add <output_format>. Missing behavior? Add <examples>. Wrong assumptions? Add <context>.

⚠ Warning: If you find yourself repeating the same constraint in multiple prompts, stop. Build it into a custom instruction in your Claude Projects or into a system prompt in Claude Code or Cowork. That is what persistent context is for.

Chain-of-thought: breaking hard problems into easy ones

Many difficult problems become tractable if you break them into smaller steps. This is chain-of-thought prompting, and the chapter explains both the technique and — more importantly — the entropy-based reason it works.

When you ask a model to solve a hard problem in one step, the probability distribution over possible next tokens is very broad. Many different approaches are plausible. But when you ask the model to "first, identify what information is relevant, then analyze that information, then derive a conclusion," you are constraining each step. Each step is easier than the full problem because each step is more specific. The model is good at small, localized reasoning. It is worse at large, global reasoning.

The chapter walks through a concrete example — a classic train-meeting-time problem — showing how the single-step approach produces errors while the step-by-step version gets it right. The difference is not magic. It is entropy reduction at each decision point.

Claude also offers extended thinking mode, where the model spends significant computational resources reasoning before responding. The chapter is clear about the trade-off: use it when the problem is genuinely complex and the cost of a mistake is high — architectural decisions, security analysis, research synthesis. Do not use it as a default. It is computationally expensive and adds latency. For routine tasks, it is wasted resources.

💡 Key idea: Chain-of-thought works because each intermediate step provides additional context for subsequent predictions. You are not just asking the model to think harder — you are giving it more information to work with at each decision point.

Multishot prompting: show, do not tell

The most practical entropy-reduction technique in the chapter is multishot prompting: showing multiple examples of what you want. Each example constrains the model's behavior more tightly than any amount of prose description.

The book is specific about how many examples you need. One-shot establishes a pattern. Two-shot usually establishes a clear pattern. Three-shot is rarely necessary unless the pattern is truly complex. More examples do not always help — if you have ten examples showing the same pattern, the tenth adds almost no new information.

What matters more than quantity is variation. Your examples should vary in the ways that matter to your task. If you are asking the model to extract information from customer emails, show examples of different writing styles (formal versus casual), different types of information (complaints versus inquiries), and different edge cases (missing information, unclear intent). This teaches the model the generalization: extract the relevant information regardless of style or format.

I will not reproduce the full example sequences from the book — those are the pages you will want to reference again and again — but I will say that once you start providing two well-chosen examples instead of writing three paragraphs of instructions, you will wonder why you ever did it any other way.

Debugging prompts when they fail

The chapter closes with a debugging framework that I found myself taping next to my monitor. When outputs do not match what you want, the issue is almost always entropy: the probability distribution is too broad. The fix is systematic:

Wrong format? → Add an <output_format> section.

Missing pieces? → Add examples showing the complete output.

Inconsistent results? → Add constraints specifying consistency requirements.

Too verbose? → Add a constraint: "Keep your response under 100 words."

Wrong tone? → Provide an example of the right tone.

Misunderstands the task? → Add context explaining why the task matters.

The chapter also covers context window trade-offs. Modern Claude has 100,000+ tokens of context, which is enormous. But longer prompts have costs: latency, token expense, and signal dilution. The rule of thumb is clean: include everything that is relevant, exclude everything that is not. If you are refactoring a specific function, include the function and its dependencies. You probably do not need the entire codebase.

What Chapter 3 sets up

By the end of this chapter, you will have a working methodology for writing prompts that reliably produce what you want. You will understand why XML-structured prompting reduces errors, how chain-of-thought and extended thinking relate to entropy at each decision point, why two well-chosen examples outperform three paragraphs of instructions, and how to systematically debug prompts that are not working.

These techniques apply across all three Claude interfaces. In the chapters that follow — starting with Part II on mastering Claude Chat — we will see how to apply them in each specific context, combined with persistent Projects, Artifacts, and operational execution.

Chapter 3 is the last chapter in Part I. It closes the foundational section. Everything from Chapter 4 onward is applied practice built on what you learned here.

Next in this series: Chapter 4 — Context Persistence with Claude Projects. We tackle the AI amnesia problem and show how Projects create persistent workspaces that remember your codebase, conventions, and constraints across every conversation.

📖 Get the complete book

All twenty chapters, the full prompt template, entropy-reduction frameworks, hands-on workflows for Claude Chat, Cowork, and Code, plus the CLI reference, CLAUDE.md templates, MCP examples, and security checklist.

Get Master Claude Chat, Cowork and Code on Amazon →

2026-03-04

Master Claude

Prompt Engineering

Entropy

XML Prompting

Chain of Thought

Multishot Prompting

AI Prompts

Claude AI

Context Engineering

Extended Thinking

Sho Shimoda

Operational AI

LLM Techniques

Sho Shimoda

I share and organize what I’ve learned and experienced.

Search Logs

IT assistant bot 1375 Deploy Teams bot to Azure 1372 Hello World bot 1356 Teams production bot 1256 bot for sprint updates 1245 Microsoft Bot Framework 1223 Teams bot development 1219 Teams app zip 1181 Zendesk Teams integration 1180 Bot Framework Adaptive Card 1168 Microsoft Teams Task Modules 1167 Teams chatbot 1165 Teams bot tutorial 1153 Teams bot packaging 1147 Bot Framework example 1143 Task Modules 1118 Bot Framework proactive messaging 1113 Graph API token 1106 Bot Framework prompts 1102 Bot Framework CLI 1098 C 1098 Azure App Service bot 1063 Azure CLI webapp deploy 1055 Adaptive Card Action.Submit 1045 sideload bot in Teams 1037 Azure Bot Services 1034 Microsoft Graph 1017 Azure bot registration 997 Adaptive Cards 992 identity in Teams 987

Development & Technical Consulting

Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.

Master Claude, Chapter 3: Understanding Entropy and Prompting Fundamentals — Why Your Prompts Fail and How to Fix Them

Ambiguity is the enemy

XML-structured prompting: why it works

Chain-of-thought: breaking hard problems into easy ones

Multishot prompting: show, do not tell

Debugging prompts when they fail

What Chapter 3 sets up

Sho Shimoda

Categories

Tags

Search Logs

Development & Technical Consulting