Background wallpaper
← BackDEVELOPMENT

LLM Security Testing & Research Toolkit

Test LLM prompts against 219 research attacks. Build DSPy defenses (targeting 95-99% based on AegisLLM paper)

Updated October 16, 20251 min read
Tech Stack:
PythonDSPyMLflowGroq

Overview

Research toolkit for LLM security testing with 219 curated jailbreak attacks validated against production systems. Completed Phase 1: Attack Testing (16.5% baseline success, multi-turn 1.48× stronger, model comparison). Phase 2: Defense validation in progress, targeting 95-99% blocking based on AegisLLM 2024 paper.

Built on 5 major 2024 research papers (Crescendo, AegisLLM, DefensiveTokens) with findings validated through extensive testing. 20+ GitHub stars from security researchers and AI practitioners.

Attack Testing Complete (Phase 1):

  • 219 research attacks validated: 16.5% baseline success rate
  • Multi-turn architecture: 1.48× improvement over single-turn
  • DSPy attack generation tested: 7-11% (underperforms curated attacks)
  • Model comparison: Kimi K2 57% better than Llama 3.3

Defense Development Next (Phase 2):

  • Pattern-Based: ✅ Validated - 70-80% blocking, ~1ms latency, FREE
  • DSPy-Optimized: ⚠️ In development - targeting 95-99% based on AegisLLM paper (99.76% reported)
  • Scripts created, large-scale validation pending

Links: