Azure OpenAI Prompt Engineering
Microsoft's enterprise-focused guide for Azure OpenAI. Five prompt components, recency bias exploitation, grounding with citations, and the 'double down' technique.
Official Microsoft docs →How GPT actually works
Understanding this changes how you write prompts. GPT models predict the most likely next words given the previous text. There's no separate 'Q&A mode'. When you ask a question, an answer appears because answers typically follow questions in training data. This is why completion-style prompts (starting the output yourself) and cues (leading words) work so well. You're not commanding the model, you're setting up a pattern for it to continue.
Five prompt components
Microsoft breaks every prompt into five components, and understanding which ones you're using (or missing) is the key to debugging bad outputs: 1) Instructions (what to do), 2) Primary content (the text being processed), 3) Examples (input/output pairs for few-shot learning), 4) Cues (output prefixes that prime the format), 5) Supporting content (context like current date, user preferences). Most weak prompts are missing components 4 and 5.
Few-shot learning
Include input/output examples to condition the model for this specific inference. This isn't permanent learning. With Chat Completions, add examples as user/assistant turns after the system message. Few-shot is dramatically more effective than verbose instructions for teaching format and classification patterns.
The 'double down' technique
Models are susceptible to recency bias. Information at the end of the prompt has more influence than information at the beginning. Microsoft's recommendation: repeat your most important instructions both before AND after the primary content. This 'double down' approach ensures critical constraints survive even with long context in the middle.
Syntax and task decomposition
Use punctuation, markdown headers, section markers (--- separators), and UPPERCASE for variables. Both Markdown and XML work well since models were trained on web content in these formats. For complex tasks, decompose: first extract factual claims, then generate search queries to verify them, then compile results. Each step reduces the error surface.
Chain of thought prompting
Instruct the model to proceed step-by-step and present all steps before the final answer. This reduces inaccuracy and makes it easy to verify reasoning. Microsoft's recommended phrasing: 'Take a step-by-step approach, cite your sources, and give your reasoning before sharing your final answer.'
Grounding with citations
Provide source material for reliable answers. The closer your reference text is to the final answer format, the less work the model does and the fewer errors it makes. The key insight: asking for inline citations makes hallucination harder because the model would need to make TWO errors (fabricate a claim AND fabricate a citation). Inline citations are more effective than citations listed at the end.
Enterprise best practices
Microsoft's summary for production deployments: Be Specific (leave nothing to interpretation). Be Descriptive (use analogies). Double Down (repeat instructions before and after content). Order Matters (exploit recency bias). Give an Out (fallback instructions prevent hallucination). Use Tables (space-efficient structured data). Minimize Whitespace (consecutive spaces are separate tokens, wasting context).