GPT-4.1 Prompting Guide

Prompting techniques specific to GPT-4.1. Covers agentic workflows, 1M token context, literal instruction following, and the planning patterns that boosted SWE-bench by 4%.

Official OpenAI docs →

Content sourced from official OpenAI documentation

The big shift: literal instruction following

GPT-4.1 follows instructions more literally than any previous GPT model. This is a fundamental change. Vague prompts that 'kinda worked' before will underperform now. The upside: when you're specific, it nails it. The downside: you can't rely on the model inferring what you meant. If you're migrating from GPT-4 or GPT-4 Turbo, audit your prompts for anything ambiguous.

💡If GPT-4.1 isn't doing what you expect, the fix is almost always more explicit instructions, not a different approach.

Building agentic workflows

Three components should appear in every agent prompt: Persistence ('keep going until the user's query is completely resolved'), Tool-calling ('do NOT guess or make up an answer without using your available tools'), and Planning (explicit reasoning between function calls). Use the API's tools field rather than manually injecting tool descriptions. Name tools clearly with detailed descriptions.

💡This persistence + tool-calling + planning pattern increased OpenAI's internal SWE-bench scores by approximately 2%. Small prompt changes, big results.

Agent system prompt

You are a helpful coding assistant. You have access to tools for reading files, writing files, and running commands.

Rules:
- ALWAYS use your tools to investigate before answering. Do NOT guess.
- Keep going until the user's query is completely resolved. Do not stop at partial solutions.
- After each tool call, reflect on what you learned and plan your next step.
- If something fails, debug it. Don't give up after one attempt.

Without persistence instructions, agents stop at the first obstacle. Without planning instructions, they chain tool calls without thinking.

Prompting-induced planning

Even though GPT-4.1 isn't a 'reasoning model', you can induce explicit chain-of-thought through prompts. This improved SWE-bench Verified pass rates by 4%. The pattern: investigate thoroughly before implementing, make incremental changes with frequent testing, debug extensively and handle edge cases, then validate and reflect on the result.

💡Planning prompts work best when they match the actual workflow. Don't prescribe generic steps. Describe how an expert would approach this specific type of task.

Planning for code changes

Before making any changes:
1. Read and understand the relevant code files thoroughly
2. Identify the root cause, not just the symptom
3. Plan your fix before writing any code
4. Make the smallest possible change that fixes the issue
5. Test your changes
6. If tests fail, debug and iterate. Do not give up.
7. Verify the fix doesn't break anything else

This structured planning approach makes GPT-4.1 significantly more reliable on complex coding tasks than raw instruction following.

1M token context handling

GPT-4.1 handles its full 1M token context with strong needle-in-haystack performance. For strict context-only responses, use: 'Only use the documents in the provided External Context.' Place instructions at both the beginning and end of long context for best results. If using a single instruction location, above the context performs better than below.

💡Avoid JSON for document collections in long context. It performed poorly in OpenAI's testing. Use Markdown headers or XML tags instead.

Instruction structure

Recommended prompt structure: Role and Objective, then Instructions, then Sub-categories, then Reasoning Steps, then Output Format, then Examples, then Context, then Final instructions. When instructions conflict, GPT-4.1 follows instructions closer to the end of the prompt. Development workflow: start with high-level 'Response Rules' bullet points, add category-specific sections, include ordered steps, debug by checking for conflicts, and add examples.

💡Markdown works great for most prompts. XML works best for precise wrapping and nesting. Avoid JSON for document collections.

Code diffs and file editing

The guide includes a custom apply_patch tool for structured code modifications. Use context lines (surrounding code) to uniquely identify where changes go, rather than line numbers which break easily. Multiple @@ statements let you target deeply nested code. This approach is how OpenAI's own coding agents handle reliable file editing in agentic workflows.

💡Context-based diffs are more robust than line-number-based diffs because they survive when other parts of the file change.

Key topics covered

Agentic behavior

Instruction following

Long context

Chain of thought

Code generation

Structured output

Read the full guide

View the complete OpenAI documentation

Official docs →

GPT-4.1 Prompting Guide

The big shift: literal instruction following

Building agentic workflows

Prompting-induced planning

1M token context handling

Instruction structure

Code diffs and file editing

Key topics covered

More guides

The Complete OpenAI Prompt Engineering Guide

Best Practices (Help Center)

The Complete Prompt Engineering Guide