Prompt Engineering

psychology

Prompting Techniques

Zero-Shot Prompting

Directly ask the model to perform a task without providing examples. Relies on the model's pre-trained knowledge and instruction-following capabilities. Works best for straightforward tasks where the model has strong prior knowledge.

Key Features
  • No examples needed in prompt
  • Fastest to implement
  • Lower token usage
  • Works for common tasks
  • May lack precision for complex tasks
Similar Technologies
Few-ShotChain-of-ThoughtDirect Instruction
Few-Shot Prompting

Provide 2-5 examples of input-output pairs before the actual query. Helps the model understand the expected format, style, and reasoning pattern. Critical for tasks requiring specific output formats or domain knowledge.

Key Features
  • Examples guide model behavior
  • Improves output consistency
  • Demonstrates expected format
  • Better for complex/novel tasks
  • Higher token usage
Similar Technologies
Zero-ShotMany-ShotDynamic Few-Shot
Chain-of-Thought (CoT)

Encourage step-by-step reasoning by asking the model to 'think through' the problem or by showing reasoning examples. Dramatically improves performance on math, logic, and multi-step reasoning tasks.

Key Features
  • Explicit reasoning steps
  • Better for math/logic problems
  • Reduces errors in complex tasks
  • Can be zero-shot ('Let's think step by step')
  • Higher latency due to longer outputs
Similar Technologies
Zero-Shot CoTSelf-ConsistencyTree-of-Thought
Self-Consistency

Generate multiple reasoning paths and select the most common answer. Samples diverse Chain-of-Thought responses and uses majority voting. Significantly improves accuracy at the cost of multiple API calls.

Key Features
  • Multiple reasoning samples
  • Majority voting for answer
  • Higher accuracy on reasoning tasks
  • Temperature > 0 for diversity
  • Higher cost (multiple calls)
Similar Technologies
Chain-of-ThoughtTree-of-ThoughtBest-of-N
Tree-of-Thought (ToT)

Explore multiple reasoning branches systematically, evaluating and pruning paths. Combines deliberate search with LLM reasoning. Best for complex problems requiring exploration like puzzles, planning, or creative tasks.

Key Features
  • Branching reasoning paths
  • Self-evaluation of branches
  • Backtracking capability
  • BFS or DFS search strategies
  • High token usage, best for hard problems
Similar Technologies
Chain-of-ThoughtGraph-of-ThoughtAlgorithm-of-Thought
ReAct (Reason + Act)

Interleave reasoning traces with actions (tool calls). Model thinks about what to do, executes an action, observes the result, and continues. Foundation for most modern AI agents and tool-using systems.

Key Features
  • Thought-Action-Observation loop
  • Integrates with external tools
  • Transparent reasoning process
  • Handles multi-step tasks
  • Core pattern for AI agents
Similar Technologies
Chain-of-ThoughtPlan-and-ExecuteReflexion
warning

When Chain-of-Thought Isn't What It Seems

CoT improves performance on many tasks, but interpretability research from Anthropic found that the relationship between written reasoning and actual internal computation isn't always faithful.

Faithful CoT

On tractable problems, the written reasoning trace genuinely reflects internal computation. Asked to compute the square root of 0.64, Claude's internal features represented the intermediate step of computing √64 — the explanation matched the process.

Trace matches processReliable for verification

Post-Hoc Reconstruction

On harder problems, the model can generate a plausible-looking derivation after the fact, without any corresponding internal calculation. The chain-of-thought is a performance — constructed to look like reasoning rather than recording it. When given a hint about an expected answer, models engage in motivated reasoning: working backwards from the target to construct justifying steps.

Trace doesn't match processUnreliable for verification
info

When you ask a model to "show its work," you may be getting a plausible reconstruction rather than a faithful record. CoT-based evaluation is most reliable for tasks where the reasoning trace can be independently verified — and least reliable for hard problems where the model might not actually know the answer.

pattern

Prompt Patterns

PatternDescriptionExample UseBest For
PersonaAssign a role or character to the model"You are an expert Python developer..."Domain expertise, tone control
TemplateStructured format with placeholders"Given {context}, answer {question}"Consistent outputs, automation
Structured OutputRequest specific format (JSON, XML, Markdown)"Respond in valid JSON with keys: name, description"API integration, parsing
System PromptPersistent instructions for conversation contextSetting behavior, constraints, guardrailsChatbots, assistants
DelimiterUse markers to separate prompt sections###, ```, <context></context>Long prompts, multi-part inputs
Output PrimerStart the response to guide format"The answer is: {"Forcing specific formats
auto_awesome

Advanced Techniques

Meta-Prompting

Use an LLM to generate or optimize prompts for another task. Have the model analyze, critique, and improve prompts iteratively. Enables automated prompt engineering at scale.

Key Features
  • LLM generates prompts
  • Automated optimization
  • Prompt critique and refinement
  • A/B testing at scale
  • Self-improving systems
Similar Technologies
Manual PromptingDSPyAutomatic Prompt Engineer
Prompt Chaining

Break complex tasks into sequential prompts where each output feeds into the next. Enables sophisticated workflows, error handling between steps, and specialized prompts per stage.

Key Features
  • Sequential prompt execution
  • Output becomes next input
  • Error handling per step
  • Specialized prompts per stage
  • Complex workflow orchestration
Similar Technologies
Single Complex PromptParallel PromptsLangChain LCEL
Constitutional AI

Define principles (a 'constitution') the model should follow, then have it self-critique and revise responses. Used by Anthropic for Claude's safety training. Can be applied in prompts for safer outputs.

Key Features
  • Define behavioral principles
  • Self-critique against rules
  • Iterative revision
  • Harmlessness training
  • Values alignment
Similar Technologies
RLHFSystem PromptsGuardrails
Directional Stimulus

Provide hints or keywords that guide the model toward a desired direction without fully specifying the answer. Useful for creative tasks where you want influence without over-constraining.

Key Features
  • Keyword hints
  • Directional guidance
  • Maintains creativity
  • Subtle steering
  • Good for generation tasks
Similar Technologies
Few-ShotConstraintsStyle Transfer
Automatic Prompt Engineer (APE)

Algorithmically search for optimal prompts using techniques like evolutionary search or gradient-based optimization. Tools like DSPy enable programmatic prompt optimization with evaluation metrics.

Key Features
  • Automated prompt search
  • Evolutionary optimization
  • Metric-driven selection
  • DSPy framework
  • Requires evaluation dataset
Similar Technologies
Manual EngineeringMeta-PromptingPrompt Tuning
Least-to-Most Prompting

Decompose complex problems into simpler subproblems, solve them in order from easiest to hardest, with each solution informing the next. Effective for compositional reasoning.

Key Features
  • Problem decomposition
  • Easiest to hardest ordering
  • Progressive complexity
  • Compositional reasoning
  • Better than CoT for some tasks
Similar Technologies
Chain-of-ThoughtPlan-and-ExecuteDecomposed Prompting
tune

Prompt Optimization

Evaluation & Metrics

Systematically measure prompt quality using automated metrics and human evaluation. Track accuracy, relevance, format compliance, latency, and cost. Build evaluation datasets and run regression tests.

Key Features
  • Accuracy/correctness measurement
  • Format compliance checking
  • Latency and cost tracking
  • Human evaluation workflows
  • Regression test suites
Similar Technologies
RAGASTruLensPromptfooLangSmith Evaluation
A/B Testing & Iteration

Compare prompt variants systematically in production. Track key metrics, statistical significance, and user feedback. Iterate based on data rather than intuition.

Key Features
  • Variant comparison
  • Statistical significance testing
  • Production traffic splitting
  • User feedback integration
  • Continuous improvement
Similar Technologies
PromptfooLangSmithWeights & Biases
Prompt Versioning

Version control prompts like code. Track changes, enable rollbacks, maintain audit trails, and manage deployment across environments. Critical for production prompt management.

Key Features
  • Git-like version control
  • Change tracking and diffs
  • Rollback capability
  • Environment management
  • Audit trails
Similar Technologies
PromptLayerLangSmith HubHumanloop
Token Optimization

Minimize token usage while maintaining quality. Compress verbose prompts, remove redundancy, use efficient encodings, and cache common prompt prefixes.

Key Features
  • Prompt compression
  • Redundancy removal
  • Efficient instruction writing
  • Prompt caching (Anthropic)
  • Cost reduction
Similar Technologies
LLMLinguaPrompt CachingShorter Models
security

Prompt Security

ThreatDescriptionMitigationTools
Prompt InjectionUser input manipulates system behaviorInput sanitization, delimiters, instruction hierarchyGuardrails AI, Rebuff, LLM Guard
JailbreakingBypassing safety guidelinesMulti-layer filtering, output validationOpenAI Moderation, Perspective API
Data LeakageExtracting training data or system promptsDon't include secrets in prompts, output filteringPresidio, custom regex filters
Indirect InjectionMalicious instructions in retrieved contentContent sanitization, source verificationInput validation, content scanning
build

Prompt Management Tools

LangSmith

LangChain's platform for prompt management, tracing, evaluation, and monitoring. Hub for sharing prompts, datasets for testing, and production observability for LLM applications.

Key Features
  • Prompt hub and versioning
  • Trace visualization
  • Evaluation datasets
  • Production monitoring
  • LangChain integration
Similar Technologies
PromptfooHumanloopPromptLayer
Promptfoo

Open-source tool for testing and evaluating prompts. Define test cases in YAML, run against multiple providers, compare outputs, and catch regressions. CI/CD integration for prompt testing.

Key Features
  • YAML test definitions
  • Multi-provider testing
  • Assertion-based evaluation
  • CI/CD integration
  • Open source
Similar Technologies
LangSmithDeepEvalCustom scripts
Humanloop

Enterprise platform for prompt management with collaboration features. Version control, A/B testing, fine-tuning management, and analytics. Built for teams managing prompts at scale.

Key Features
  • Collaborative editing
  • Version control
  • A/B testing
  • Fine-tuning integration
  • Enterprise analytics
Similar Technologies
LangSmithPromptLayerWeights & Biases
PromptLayer

Middleware for logging and managing prompts. Wraps API calls to track all requests, responses, and latency. Template management, versioning, and analytics dashboard.

Key Features
  • Request/response logging
  • Template management
  • Latency tracking
  • Analytics dashboard
  • Easy integration
Similar Technologies
LangSmithHeliconeCustom logging
DSPy

Stanford framework for programmatic prompt optimization. Define modules and signatures, then compile to optimized prompts using training data. Enables systematic prompt engineering with code.

Key Features
  • Programmatic prompts
  • Signature-based modules
  • Automatic optimization
  • Training data compilation
  • Reproducible pipelines
Similar Technologies
Manual EngineeringLangChainGuidance
Guidance

Microsoft's library for constrained generation. Define output structure with templates, enforce JSON schemas, control generation token-by-token. More reliable structured outputs.

Key Features
  • Template-based generation
  • Schema enforcement
  • Token-level control
  • Interleaved generation
  • Reliable JSON output
Similar Technologies
InstructorOutlinesLMQL