Will Percey — Portfolio

Prompt Engineering

> > Updated Mar 2026

psychology

Prompting Techniques

Zero-Shot Prompting

Directly ask the model to perform a task without providing examples. Relies on the model's pre-trained knowledge and instruction-following capabilities. Works best for straightforward tasks where the model has strong prior knowledge.

Key Features

No examples needed in prompt
Fastest to implement
Lower token usage
Works for common tasks
May lack precision for complex tasks

Similar Technologies

Few-ShotChain-of-ThoughtDirect Instruction

Few-Shot Prompting

Provide 2-5 examples of input-output pairs before the actual query. Helps the model understand the expected format, style, and reasoning pattern. Critical for tasks requiring specific output formats or domain knowledge.

Key Features

Examples guide model behavior
Improves output consistency
Demonstrates expected format
Better for complex/novel tasks
Higher token usage

Similar Technologies

Zero-ShotMany-ShotDynamic Few-Shot

Chain-of-Thought (CoT)

Encourage step-by-step reasoning by asking the model to 'think through' the problem or by showing reasoning examples. Dramatically improves performance on math, logic, and multi-step reasoning tasks.

Key Features

Explicit reasoning steps
Better for math/logic problems
Reduces errors in complex tasks
Can be zero-shot ('Let's think step by step')
Higher latency due to longer outputs

Similar Technologies

Zero-Shot CoTSelf-ConsistencyTree-of-Thought

Self-Consistency

Generate multiple reasoning paths and select the most common answer. Samples diverse Chain-of-Thought responses and uses majority voting. Significantly improves accuracy at the cost of multiple API calls.

Key Features

Multiple reasoning samples
Majority voting for answer
Higher accuracy on reasoning tasks
Temperature > 0 for diversity
Higher cost (multiple calls)

Similar Technologies

Chain-of-ThoughtTree-of-ThoughtBest-of-N

Tree-of-Thought (ToT)

Explore multiple reasoning branches systematically, evaluating and pruning paths. Combines deliberate search with LLM reasoning. Best for complex problems requiring exploration like puzzles, planning, or creative tasks.

Key Features

Branching reasoning paths
Self-evaluation of branches
Backtracking capability
BFS or DFS search strategies
High token usage, best for hard problems

Similar Technologies

Chain-of-ThoughtGraph-of-ThoughtAlgorithm-of-Thought

ReAct (Reason + Act)

Interleave reasoning traces with actions (tool calls). Model thinks about what to do, executes an action, observes the result, and continues. Foundation for most modern AI agents and tool-using systems.

Key Features

Thought-Action-Observation loop
Integrates with external tools
Transparent reasoning process
Handles multi-step tasks
Core pattern for AI agents

Similar Technologies

Chain-of-ThoughtPlan-and-ExecuteReflexion

warning

When Chain-of-Thought Isn't What It Seems

CoT improves performance on many tasks, but interpretability research from Anthropic found that the relationship between written reasoning and actual internal computation isn't always faithful.

Faithful CoT

On tractable problems, the written reasoning trace genuinely reflects internal computation. Asked to compute the square root of 0.64, Claude's internal features represented the intermediate step of computing √64 — the explanation matched the process.

Trace matches processReliable for verification

Post-Hoc Reconstruction

On harder problems, the model can generate a plausible-looking derivation after the fact, without any corresponding internal calculation. The chain-of-thought is a performance — constructed to look like reasoning rather than recording it. When given a hint about an expected answer, models engage in motivated reasoning: working backwards from the target to construct justifying steps.

Trace doesn't match processUnreliable for verification

info

When you ask a model to "show its work," you may be getting a plausible reconstruction rather than a faithful record. CoT-based evaluation is most reliable for tasks where the reasoning trace can be independently verified — and least reliable for hard problems where the model might not actually know the answer.

pattern

Prompt Patterns

Pattern	Description	Example Use	Best For
Persona	Assign a role or character to the model	"You are an expert Python developer..."	Domain expertise, tone control
Template	Structured format with placeholders	"Given {context}, answer {question}"	Consistent outputs, automation
Structured Output	Request specific format (JSON, XML, Markdown)	"Respond in valid JSON with keys: name, description"	API integration, parsing
System Prompt	Persistent instructions for conversation context	Setting behavior, constraints, guardrails	Chatbots, assistants
Delimiter	Use markers to separate prompt sections	###, ```, <context></context>	Long prompts, multi-part inputs
Output Primer	Start the response to guide format	"The answer is: {"	Forcing specific formats

auto_awesome

Advanced Techniques

Meta-Prompting

Use an LLM to generate or optimize prompts for another task. Have the model analyze, critique, and improve prompts iteratively. Enables automated prompt engineering at scale.

Key Features

LLM generates prompts
Automated optimization
Prompt critique and refinement
A/B testing at scale
Self-improving systems

Similar Technologies

Manual PromptingDSPyAutomatic Prompt Engineer

Prompt Chaining

Break complex tasks into sequential prompts where each output feeds into the next. Enables sophisticated workflows, error handling between steps, and specialized prompts per stage.

Key Features

Sequential prompt execution
Output becomes next input
Error handling per step
Specialized prompts per stage
Complex workflow orchestration

Similar Technologies

Single Complex PromptParallel PromptsLangChain LCEL

Constitutional AI

Define principles (a 'constitution') the model should follow, then have it self-critique and revise responses. Used by Anthropic for Claude's safety training. Can be applied in prompts for safer outputs.

Key Features

Define behavioral principles
Self-critique against rules
Iterative revision
Harmlessness training
Values alignment

Similar Technologies

RLHFSystem PromptsGuardrails

Directional Stimulus

Provide hints or keywords that guide the model toward a desired direction without fully specifying the answer. Useful for creative tasks where you want influence without over-constraining.

Key Features

Keyword hints
Directional guidance
Maintains creativity
Subtle steering
Good for generation tasks

Similar Technologies

Few-ShotConstraintsStyle Transfer

Automatic Prompt Engineer (APE)

Algorithmically search for optimal prompts using techniques like evolutionary search or gradient-based optimization. Tools like DSPy enable programmatic prompt optimization with evaluation metrics.

Key Features

Automated prompt search
Evolutionary optimization
Metric-driven selection
DSPy framework
Requires evaluation dataset

Similar Technologies

Manual EngineeringMeta-PromptingPrompt Tuning

Least-to-Most Prompting

Decompose complex problems into simpler subproblems, solve them in order from easiest to hardest, with each solution informing the next. Effective for compositional reasoning.

Key Features

Problem decomposition
Easiest to hardest ordering
Progressive complexity
Compositional reasoning
Better than CoT for some tasks

Similar Technologies

Chain-of-ThoughtPlan-and-ExecuteDecomposed Prompting

tune

Prompt Optimization

Evaluation & Metrics

Systematically measure prompt quality using automated metrics and human evaluation. Track accuracy, relevance, format compliance, latency, and cost. Build evaluation datasets and run regression tests.

Key Features

Accuracy/correctness measurement
Format compliance checking
Latency and cost tracking
Human evaluation workflows
Regression test suites

Similar Technologies

RAGASTruLensPromptfooLangSmith Evaluation

A/B Testing & Iteration

Compare prompt variants systematically in production. Track key metrics, statistical significance, and user feedback. Iterate based on data rather than intuition.

Key Features

Variant comparison
Statistical significance testing
Production traffic splitting
User feedback integration
Continuous improvement

Similar Technologies

PromptfooLangSmithWeights & Biases

Prompt Versioning

Version control prompts like code. Track changes, enable rollbacks, maintain audit trails, and manage deployment across environments. Critical for production prompt management.

Key Features

Git-like version control
Change tracking and diffs
Rollback capability
Environment management
Audit trails

Similar Technologies

PromptLayerLangSmith HubHumanloop

Token Optimization

Minimize token usage while maintaining quality. Compress verbose prompts, remove redundancy, use efficient encodings, and cache common prompt prefixes.

Key Features

Prompt compression
Redundancy removal
Efficient instruction writing
Prompt caching (Anthropic)
Cost reduction

Similar Technologies

LLMLinguaPrompt CachingShorter Models

security

Prompt Security

Threat	Description	Mitigation	Tools
Prompt Injection	User input manipulates system behavior	Input sanitization, delimiters, instruction hierarchy	Guardrails AI, Rebuff, LLM Guard
Jailbreaking	Bypassing safety guidelines	Multi-layer filtering, output validation	OpenAI Moderation, Perspective API
Data Leakage	Extracting training data or system prompts	Don't include secrets in prompts, output filtering	Presidio, custom regex filters
Indirect Injection	Malicious instructions in retrieved content	Content sanitization, source verification	Input validation, content scanning

build

Prompt Management Tools

LangSmith

LangChain's platform for prompt management, tracing, evaluation, and monitoring. Hub for sharing prompts, datasets for testing, and production observability for LLM applications.

Key Features

Prompt hub and versioning
Trace visualization
Evaluation datasets
Production monitoring
LangChain integration

Similar Technologies

PromptfooHumanloopPromptLayer

Promptfoo

Open-source tool for testing and evaluating prompts. Define test cases in YAML, run against multiple providers, compare outputs, and catch regressions. CI/CD integration for prompt testing.

Key Features

YAML test definitions
Multi-provider testing
Assertion-based evaluation
CI/CD integration
Open source

Similar Technologies

LangSmithDeepEvalCustom scripts

Humanloop

Enterprise platform for prompt management with collaboration features. Version control, A/B testing, fine-tuning management, and analytics. Built for teams managing prompts at scale.

Key Features

Collaborative editing
Version control
A/B testing
Fine-tuning integration
Enterprise analytics

Similar Technologies

LangSmithPromptLayerWeights & Biases

PromptLayer

Middleware for logging and managing prompts. Wraps API calls to track all requests, responses, and latency. Template management, versioning, and analytics dashboard.

Key Features

Request/response logging
Template management
Latency tracking
Analytics dashboard
Easy integration

Similar Technologies

LangSmithHeliconeCustom logging

DSPy

Stanford framework for programmatic prompt optimization. Define modules and signatures, then compile to optimized prompts using training data. Enables systematic prompt engineering with code.

Key Features

Programmatic prompts
Signature-based modules
Automatic optimization
Training data compilation
Reproducible pipelines

Similar Technologies

Manual EngineeringLangChainGuidance

Guidance

Microsoft's library for constrained generation. Define output structure with templates, enforce JSON schemas, control generation token-by-token. More reliable structured outputs.

Key Features

Template-based generation
Schema enforcement
Token-level control
Interleaved generation
Reliable JSON output

Similar Technologies

InstructorOutlinesLMQL