Bidirectional security framework for human/LLM interfaces implementing defense-in-depth architecture with multiple validation layers.
Version: 2.5.0 Python: >=3.12 License: MIT Status: Production
A bidirectional security layer for LLM-based systems. Validates input, output, and agent state transitions.
- Input, output, memory, and state validation
- Supports agent frameworks, tool-using models, and API gateways
- Detection: prompt injection, jailbreaks, obfuscation, evasive encodings, unauthorized tool actions, cognitive leak risks, children-safety failures, malformed JSON, memory-poisoning
- Local execution: no telemetry, no external calls
- API:
guard.check_input(text)/guard.check_output(text)
from llm_firewall import guard
# Example: Input validation. The v2.4.1 update reduced false positives for
# benign educational queries matching the 'explain how...' pattern.
user_prompt = "Explain how rain forms."
decision = guard.check_input(user_prompt)
print(f"Blocked: {not decision.allowed}, Reason: {decision.reason}")
if decision.allowed:
# LLM backend call
response = llm(user_prompt)
out = guard.check_output(response)
if not out.allowed:
print(f"Response sanitized: {out.reason}")
else:
print(f"LLM Output: {out.cleaned_text}")Lightweight baseline with ONNX-only inference:
pip install llm-security-firewall
# OR
pip install -r requirements-core.txtThis provides:
- Pattern matching and basic validation
- ONNX-based semantic guard (CUDA-enabled)
- Memory footprint: ~54 MB (96% reduction from original 1.3 GB)
For advanced validators (TruthPreservationValidator, TopicFence):
pip install llm-security-firewall[full]
# OR
pip install -r requirements.txtHeavy components (PyTorch, transformers) are loaded on-demand only - they don't affect the baseline.
For development (local installation):
pip install -e .For development dependencies:
pip install -e .[dev]Optional extras:
pip install llm-security-firewall[langchain] # LangChain integration
pip install llm-security-firewall[dev] # Development tools
pip install llm-security-firewall[monitoring] # Monitoring tools- Bidirectional validation: Input, output, and memory integrity validation
- Sequential validation layers: UnicodeSanitizer, NormalizationLayer, RegexGate, Input Analysis, Tool Inspection, Output Validation
- Statistical methods: CUSUM for drift detection, Dempster-Shafer theory for evidence fusion, fail-closed risk gating
- Multilingual detection: Polyglot attack detection across 12+ languages including low-resource languages (Basque, Maltese tested)
- Unicode normalization: Zero-width character removal, bidirectional override detection, homoglyph normalization, encoding anomaly detection
- Session state tracking: Session state management, drift detection, cumulative risk tracking
- Tool call validation: HEPHAESTUS protocol for tool call validation and killchain detection
- Published metrics: False Positive Rate (FPR), P99 latency, and memory usage documented in
/docs/ - Hexagonal architecture: Protocol-based adapters for framework independence
The system implements a stateful, bidirectional containment mechanism for large language models. Requests are processed through sequential validation layers with mathematical constraints and stateful tracking.
Architectural principles:
- Bidirectional validation: All data paths validated (input, output, in-memory state transitions)
- Hexagonal architecture: Protocol-based Port/Adapter interfaces with dependency injection
- Domain separation: Core business logic separated from infrastructure concerns
- Framework independence: Domain layer uses Protocol-based adapters (
DecisionCachePort,DecoderPort,ValidatorPort) - Deterministic normalization: Multi-pass Unicode normalization, homoglyph resolution, Base64/Hex/URL decoding, JSON-hardening
- Statistical methods: CUSUM detectors for oscillation detection, Dempster-Shafer uncertainty modeling for evidence fusion, fail-closed risk gating
- Stateful protection: Policies operate on text, agent state, tool sequences, and memory mutation events
- Local execution: No telemetry, no external API calls, no data exfiltration
The system operates in three directions:
-
Human → LLM (Input Protection)
- Normalization and sanitization
- Pattern matching and evasion detection
- Risk scoring and policy evaluation
- Session state tracking
-
LLM → Human (Output Protection)
- Evidence validation
- Tool call validation
- Output sanitization
- Truth preservation checks
-
Memory Integrity
- Session state management
- Drift detection
- Influence tracking
Firewall Engine (src/llm_firewall/core/firewall_engine_v2.py)
- Main decision engine
- Risk score aggregation
- Policy application
- Unicode security analysis
Normalization Layer (src/hak_gal/layers/inbound/normalization_layer.py)
- Recursive URL/percent decoding
- Unicode normalization (NFKC)
- Zero-width character removal
- Directional override character removal
Pattern Matching (src/llm_firewall/rules/patterns.py)
- Regex-based pattern detection
- Concatenation-aware matching
- Evasion pattern detection
Risk Scoring (src/llm_firewall/core/risk_scorer.py)
- Multi-factor risk calculation
- Cumulative risk tracking
- Threshold-based decisions
Cache System (src/llm_firewall/cache/decision_cache.py)
- Exact match caching (Redis)
- Semantic caching (LangCache)
- Hybrid mode support
- Circuit breaker pattern
- Fail-safe behavior (blocks on cache failure, prevents security bypass)
Adapter Health (src/llm_firewall/core/adapter_health.py)
- Circuit breaker implementation
- Health metrics tracking
- Failure threshold management
- Recovery timeout handling
Developer Adoption API (src/llm_firewall/guard.py)
- API:
guard.check_input(text),guard.check_output(text) - Backward compatible with existing API
- Integration guide:
QUICKSTART.md
LangChain Integration (src/llm_firewall/integrations/langchain/callbacks.py)
FirewallCallbackHandlerfor LangChain chains- Automatic input/output validation
- See
examples/langchain_integration.pyfor usage
Configure via CACHE_MODE environment variable:
exact(default): Redis exact match cachesemantic: LangCache semantic searchhybrid: Both caches in sequence
export REDIS_URL=redis://:password@host:6379/0
export REDIS_TTL=3600 # Optional: Cache TTL in secondsFor Redis Cloud:
export REDIS_CLOUD_HOST=host
export REDIS_CLOUD_PORT=port
export REDIS_CLOUD_USERNAME=username
export REDIS_CLOUD_PASSWORD=passwordIntegration examples in examples/ directory:
quickstart.py- Basic integration usingguard.pyAPIlangchain_integration.py- LangChain integration withFirewallCallbackHandlerminimal_fastapi.py- FastAPI middleware integrationquickstart_fastapi.py- FastAPI example with input/output validation
Run examples:
python examples/quickstart.py
python examples/langchain_integration.py
python examples/minimal_fastapi.pyBasic usage:
from llm_firewall import guard
decision = guard.check_input("user input text")
if decision.allowed:
# Process request
passTest suite includes unit tests, integration tests, and adversarial test cases.
pytest tests/ -vWith coverage:
pytest tests/ -v --cov=src/llm_firewall --cov-report=termCore (Required for basic functionality):
- numpy>=1.24.0
- scipy>=1.11.0
- scikit-learn>=1.3.0
- pyyaml>=6.0
- blake3>=0.3.0
- requests>=2.31.0
- psycopg[binary]>=3.1.0
- redis>=5.0.0
- pydantic>=2.0.0
- psutil>=5.9.0
- cryptography>=41.0.0
Machine Learning (Optional, for advanced features):
- sentence-transformers>=2.2.0 (SemanticVectorCheck, embedding-based detection)
- torch>=2.0.0 (ML model inference)
- transformers>=4.30.0 (Transformer-based detectors)
- onnx>=1.14.0 (ONNX model support)
- onnxruntime>=1.16.0 (ONNX runtime)
Note: Core functionality (Unicode normalization, pattern matching, risk scoring, basic validation) operates without ML dependencies. Semantic similarity detection and Kids Policy Engine require optional ML dependencies.
System Requirements:
- Python >=3.12 (by design, no legacy support)
- RAM: ~300MB for core functionality, ~1.3GB for adversarial inputs with full ML features
- GPU: Optional, only required for certain ML-based detectors
- Redis: Optional but recommended for caching (local or cloud)
- False Positive Rate: Kids Policy false positive rate is 0.00% on validation dataset (target: ≤5.0%, met in v2.4.1)
- Memory Usage: Current memory usage exceeds 300MB cap for adversarial inputs (measured: ~1.3GB)
- Unicode Normalization: Some edge cases in mathematical alphanumeric symbol handling
- Python Version: Requires Python >=3.12 (by design, no legacy support for 3.10/3.11)
- Dependencies: Core functionality requires numpy, scipy, scikit-learn; full ML features require torch, transformers, sentence-transformers (see Dependencies section)
This library reduces risk but does not guarantee complete protection.
Required additional security controls:
- Authentication and authorization
- Network isolation
- Logging and monitoring
- Rate limiting
- Sandboxing of tool environments
The maintainers assume no liability for misuse.
Use only in compliance with local law and data-protection regulations.
-
Multi-Tenant Isolation
- Session hashing via HMAC-SHA256(tenant_id + user_id + DAILY_SALT)
- Redis key isolation via ACLs and prefixes
-
Oscillation Defense
- CUSUM (Cumulative Sum Control Chart) algorithm
- Accumulative risk tracking across session turns
-
Parser Differential Protection
- StrictJSONDecoder with duplicate key detection
- Immediate exception on key duplication
-
Unicode Security
- Zero-width character detection and removal
- Directional override character detection
- Homoglyph normalization
-
Multilingual Attack Detection
- Polyglot attack detection across 12+ languages
- Low-resource language hardening (Basque, Maltese tested)
- Language switching detection
- Multilingual keyword detection (Chinese, Japanese, Russian, Arabic, Hindi, Korean, and others)
-
Pattern Evasion Detection
- Concatenation-aware pattern matching
- Encoding anomaly detection
- Obfuscation pattern recognition
- P99 Latency: <200ms for standard inputs (measured)
- Cache Hit Rate: 30-50% (exact), 70-90% (hybrid)
- Cache Latency: <100ms (Redis Cloud), <1ms (local Redis)
MCP monitoring tools available for health checks and metrics:
firewall_health_check: Redis/Session health inspectionfirewall_deployment_status: Traffic percentage and rollout phasefirewall_metrics: Real-time block rates and CUSUM scoresfirewall_check_alerts: Critical P0 alertsfirewall_redis_status: ACL and connection pool health
P0 Items (Critical):
- Circuit breaker pattern: Implemented
- False positive tracking: Implemented (rate: ~5% as of v2.4.1)
- P99 latency metrics: Implemented (<200ms verified)
- Cache mode switching: Implemented
- Adversarial bypass detection: Implemented (0/50 bypasses in test suite)
P1 Items (High Priority):
- Shadow-allow mechanism: Configuration-only
- Cache invalidation strategy: TTL-based
- Bloom filter parameters: Configurable
P2 Items (Medium Priority):
- Concurrency model: Single-threaded
- Progressive decoding: Not implemented
- Forensic capabilities: Basic logging
- STRIDE threat model: Partial
The Phase 2 evaluation pipeline provides self-contained, standard-library-only tools for evaluating AnswerPolicy effectiveness:
- ASR/FPR Metrics: Attack Success Rate and False Positive Rate computation
- Multi-Policy Comparison: Compare baseline, default, kids, and internal_debug policies
- Latency Measurement: Optional per-request latency tracking
- Bootstrap Confidence Intervals: Optional non-parametric CIs for ASR/FPR
- Dataset Validation: Schema compliance, ASCII-only checks, statistics
Quick Start:
python scripts/run_phase2_suite.py --config smoke_test_coreDocumentation:
- AnswerPolicy Phase 2 Evaluation (v2.4.1) – Technical Handover – Complete technical documentation
- AnswerPolicy Evaluation User Workflow – User guide
Evaluation Scope & Limitations:
- Current evaluation uses small sample sizes (20-200 items) suitable for local smoke tests
p_correctestimator is uncalibrated (heuristic-based, not probabilistic model)- Datasets use template-based generation, not real-world distributions
- Block attribution is conservative (lower bound for AnswerPolicy contributions)
- Bootstrap CIs are approximate indicators, not publication-grade statistics
For production-grade evaluation with larger datasets and calibrated models, see Future Work in the technical handover document.
Latest Version: v2.5.0 (2025-12-05)
Kids Policy Performance:
- False Positive Rate: 0.00% (target: ≤5.0%, met in v2.4.1)
- Attack Success Rate: 40.00% (stable)
- Validation Report: VALIDATION_REPORT_v2.4.1.md
Recent Changes:
- v2.4.1: UNSAFE_TOPIC false positive reduction (whitelist filter for benign educational queries)
- UNSAFE_TOPIC false positives: 17 eliminated (100% of identified cases)
- FPR change: 22% → 0.00% (100% elimination on validation dataset), ASR unchanged
- Architecture documentation:
docs/SESSION_HANDOVER_2025_12_01.md(v2.4.0rc1) - Technical handover:
docs/TECHNICAL_HANDOVER_2025_12_01.md(pre-v2.4.0rc1) - Test results:
docs/TEST_RESULTS_SUMMARY.md - External review response:
docs/EXTERNAL_REVIEW_RESPONSE.md - PyPI release report:
docs/PYPI_RELEASE_REPORT_2025_12_02.md - AnswerPolicy Phase 2 Evaluation:
docs/ANSWER_POLICY_EVALUATION_PHASE2_2_4_0.md(v2.4.1) - Adaptive Learning Architecture:
docs/ADAPTIVE_SESSION_LEARNING_ARCHITECTURE.md(Design Proposal)
MIT License
Copyright (c) 2025 Joerg Bollwahn
Joerg Bollwahn Email: sookoothaii@proton.me