aiwithgrant
about me
aiwithgrantGuidesMicrosoft
Microsoft
Microsoft
Azure Docs
Intermediate

Azure OpenAI Prompt Engineering

Microsoft's enterprise-focused guide for Azure OpenAI. Five prompt components, recency bias exploitation, grounding with citations, and the 'double down' technique.

Official Microsoft docs →
Content sourced from official Microsoft documentation
1

How GPT actually works

Understanding this changes how you write prompts. GPT models predict the most likely next words given the previous text. There's no separate 'Q&A mode'. When you ask a question, an answer appears because answers typically follow questions in training data. This is why completion-style prompts (starting the output yourself) and cues (leading words) work so well. You're not commanding the model, you're setting up a pattern for it to continue.

💡This is why starting your output format works: if you write 'Analysis: 1.' the model continues with a numbered list. You're giving it a pattern to complete, not an instruction to follow.
2

Five prompt components

Microsoft breaks every prompt into five components, and understanding which ones you're using (or missing) is the key to debugging bad outputs: 1) Instructions (what to do), 2) Primary content (the text being processed), 3) Examples (input/output pairs for few-shot learning), 4) Cues (output prefixes that prime the format), 5) Supporting content (context like current date, user preferences). Most weak prompts are missing components 4 and 5.

💡Cues are the most underused component. Instead of asking for JSON, start the output with '{'. Instead of asking for a list, start with '1.'. This alone fixes most formatting issues.
Using all five components
[Instructions] Classify the support ticket and suggest a response.

[Primary content] Ticket: 'My account was charged twice for the same order #4521.'

[Examples]
Ticket: 'I can't log in' → Category: Account Access, Priority: Medium
Ticket: 'Wrong item received' → Category: Order Issue, Priority: High

[Supporting content] Today is Feb 14, 2026. Customer is on Premium plan.

[Cue] Category:
The cue 'Category:' primes the model to start with the classification. Without it, you often get a preamble before the actual answer.
3

Few-shot learning

Include input/output examples to condition the model for this specific inference. This isn't permanent learning. With Chat Completions, add examples as user/assistant turns after the system message. Few-shot is dramatically more effective than verbose instructions for teaching format and classification patterns.

💡For classification tasks, 3 examples covering different categories is usually enough. For complex formatting, show 2 examples of the exact structure.
Classification with few-shot
[System] Classify customer inquiries.

[User] I can't log into my account.
[Assistant] Category: Account

[User] My invoice shows the wrong amount.
[Assistant] Category: Billing

[User] The app crashes when I export data.
[Assistant] Category: Technical

[User] {{NEW_INQUIRY}}
The instruction version often adds explanations and varies the format. The few-shot version produces clean, consistent one-line classifications every time.
4

The 'double down' technique

Models are susceptible to recency bias. Information at the end of the prompt has more influence than information at the beginning. Microsoft's recommendation: repeat your most important instructions both before AND after the primary content. This 'double down' approach ensures critical constraints survive even with long context in the middle.

💡This is especially important for enterprise prompts with long documents. Place your key instruction at the top, the document in the middle, and repeat the instruction at the bottom.
Instruction repetition
Summarize the following document in exactly 3 bullet points. Focus only on financial impact.

---
{{LONG_DOCUMENT}}
---

Remember: exactly 3 bullet points, financial impact only.
Without repeating at the end, the model often ignores the '3 bullet points' constraint with long documents. Repeating it exploits recency bias to enforce compliance.
5

Syntax and task decomposition

Use punctuation, markdown headers, section markers (--- separators), and UPPERCASE for variables. Both Markdown and XML work well since models were trained on web content in these formats. For complex tasks, decompose: first extract factual claims, then generate search queries to verify them, then compile results. Each step reduces the error surface.

💡Use --- separators between sections. The model recognizes these as boundaries and is less likely to confuse instructions with content.
6

Chain of thought prompting

Instruct the model to proceed step-by-step and present all steps before the final answer. This reduces inaccuracy and makes it easy to verify reasoning. Microsoft's recommended phrasing: 'Take a step-by-step approach, cite your sources, and give your reasoning before sharing your final answer.'

💡Note from Microsoft: this technique is NOT applicable to reasoning models like the o-series. Those models already think internally. Adding CoT instructions to o-series models can actually degrade performance.
7

Grounding with citations

Provide source material for reliable answers. The closer your reference text is to the final answer format, the less work the model does and the fewer errors it makes. The key insight: asking for inline citations makes hallucination harder because the model would need to make TWO errors (fabricate a claim AND fabricate a citation). Inline citations are more effective than citations listed at the end.

💡Give the model an 'out': include 'respond with not found if the answer isn't present in the source material.' This prevents confident-sounding hallucination when the answer genuinely isn't there.
8

Enterprise best practices

Microsoft's summary for production deployments: Be Specific (leave nothing to interpretation). Be Descriptive (use analogies). Double Down (repeat instructions before and after content). Order Matters (exploit recency bias). Give an Out (fallback instructions prevent hallucination). Use Tables (space-efficient structured data). Minimize Whitespace (consecutive spaces are separate tokens, wasting context).

💡The whitespace tip is easy to miss but matters at scale. Extra blank lines and indentation consume tokens that could be used for actual content.

Key topics covered

Azure OpenAI
System messages
Few-shot learning
Chain of thought
Grounding
Enterprise patterns
Read the full guide
View the complete Microsoft documentation
Official docs →

More guides