AI Security

OWASP LLM Top 10 Vulnerabilities (2025)

RankVulnerabilityDescriptionMitigation
LLM01Prompt InjectionUser prompts alter the LLM's behavior by overriding or manipulating system instructionsInput validation, prompt isolation, output filtering
LLM02Sensitive Information DisclosureLLM reveals training data, PII, or confidential information affecting both model and applicationData filtering, PII detection, output monitoring, redaction
LLM03Supply ChainLLM supply chains susceptible to vulnerabilities from compromised models, datasets, or dependenciesModel verification, dependency scanning, provenance tracking
LLM04Data and Model PoisoningManipulation of pre-training, fine-tuning, or embedding data to introduce vulnerabilities or biasesData provenance, validation, anomaly detection, sanitization
LLM05Improper Output HandlingInsufficient validation, sanitization, and handling of LLM outputs before downstream useSanitize outputs, validate before execution, sandbox environments
LLM06Excessive AgencyLLM-based systems granted excessive autonomy or permissions beyond intended scopePrinciple of least privilege, human-in-loop, action guardrails
LLM07System Prompt LeakageExposure of system prompts that reveal internal instructions, configurations, or sensitive logicPrompt protection, input filtering, access controls, monitoring
LLM08Vector and Embedding WeaknessesVulnerabilities in vector databases and embeddings present security risks in RAG systemsVector validation, embedding security, access controls, monitoring
LLM09MisinformationLLMs generate false, misleading, or fabricated information (hallucinations) in responsesHuman review, fact-checking, confidence scores, source citations
LLM10Unbounded ConsumptionLLM processes consume excessive resources leading to DoS, cost overruns, or service degradationRate limiting, input length limits, cost monitoring, auto-scaling

AI-Specific Attack Vectors

Adversarial Attacks

Definition: Crafted inputs designed to fool ML models

Types:

  • Evasion: Bypass detection (e.g., spam filters)
  • Poisoning: Corrupt training data
  • Model Inversion: Extract training data
  • Model Extraction: Steal model via queries

Defense: Adversarial training, input validation, robustness testing

Prompt Injection

Definition: Manipulate LLM behavior via malicious prompts

Types:

  • Direct: User input overrides system prompt
  • Indirect: Malicious content in retrieved data
  • Jailbreaking: Bypass safety guardrails
  • Prompt Leaking: Expose system prompts

Defense: Prompt isolation, input sanitization, output filtering

Data Poisoning

Definition: Inject malicious data into training set

Impact:

  • Backdoors: Trigger specific behaviors
  • Bias Injection: Introduce systematic bias
  • Performance Degradation: Reduce accuracy
  • Targeted Attacks: Misclassify specific inputs

Defense: Data provenance, anomaly detection, sanitization

Model Extraction

Definition: Steal model by querying it repeatedly

Techniques:

  • Equation-solving: Reconstruct decision boundaries
  • Path-finding: Discover model structure
  • Knowledge distillation: Train copy via queries
  • API abuse: Extract via excessive queries

Defense: Query monitoring, rate limiting, watermarking

AI Security Best Practices

Input Security

  • Validate and sanitize all inputs
  • Enforce input length limits
  • Detect and block malicious patterns
  • Use allowlists over blocklists
  • Implement rate limiting per user/IP
  • Log suspicious inputs for analysis

Output Security

  • Sanitize outputs before use
  • Validate before code execution
  • Filter sensitive information (PII, secrets)
  • Implement output guardrails
  • Monitor for data leakage
  • Use confidence thresholds

Model Security

  • Verify model provenance
  • Scan for embedded malware
  • Use model signing/checksums
  • Restrict model access (RBAC)
  • Encrypt models at rest and in transit
  • Monitor for model theft attempts

Data Security

  • Encrypt training data
  • Implement data access controls
  • Validate data provenance
  • Detect poisoned data
  • Use differential privacy
  • Anonymize/pseudonymize PII

Infrastructure Security

  • Isolate ML workloads (VPC, containers)
  • Use least privilege access
  • Enable audit logging
  • Implement network segmentation
  • Regular security patching
  • DDoS protection and WAF

Monitoring & Response

  • Monitor for anomalous queries
  • Track model behavior changes
  • Alert on security events
  • Implement incident response plan
  • Regular security audits
  • Threat modeling and pen testing

AI Security Tools & Frameworks

Guardrails Frameworks

  • NVIDIA NeMo Guardrails: LLM safety rails, content filtering
  • Guardrails AI: Validate LLM outputs against specs
  • AWS Bedrock Guardrails: Content filtering, PII detection
  • LangKit: LLM observability and security
  • Rebuff: Prompt injection detection

Security Scanning

  • Garak: LLM vulnerability scanner
  • PyRIT: Microsoft's AI red team tool
  • PromptInject: Prompt injection tester
  • Adversarial Robustness Toolbox (ART): IBM's defense toolkit
  • CleverHans: Adversarial example library

Model Security

  • ModelScan: Malware scanning for ML models
  • Modelsafe: Model integrity verification
  • Hugging Face Security Scanner: Model card validation
  • ONNX Model Zoo Security: Verified models

Privacy Tools

  • Opacus: PyTorch differential privacy
  • TensorFlow Privacy: DP-SGD implementation
  • PySyft: Federated learning and encrypted ML
  • Microsoft Presidio: PII detection and anonymization