Inspiration
Exfurnace started from a simple premise: traditional penetration tests are valuable but episodic, expensive, and often leave long windows of untested exposure. We saw an opportunity to use AI to automate routine offensive tasks — reconnaissance, correlation, and triage — so human red teams can focus on high-risk, creative engagements. The goal: continuous, responsible validation of controls that produces audit-ready evidence and shortens time-to-remediate.
What it does
- Safe automated offense (scoped): Orchestrates non-destructive, authorized checks that simulate attacker behaviors to reveal misconfigurations, weak authentication, exposed services, and risky cloud posture.
- AI-assisted triage & reporting: Deduplicates findings, scores risk, and generates clear, prioritized remediation guidance and human-friendly summaries.
- Compliance mapping: Maps findings to common frameworks (CIS, NIST 800-53, ISO 27001, OWASP, HIPAA/GDPR considerations) and generates exportable evidence bundles for audits.
- Responsible-use guardrails: Enforces scope checks, rate limits, credential handling rules, consent verification, and immutable auditing to ensure tests run only where authorized.
- Extensible integrations: Plugins for SIEM, ticketing, vulnerability management, and enterprise onboarding.
Ethical note: Exfurnace requires explicit written authorization (scope and Rules of Engagement) before any offensive activity. It is intended for authorized, defensive security use only.
How we built it
- Modular architecture: Controller/orchestrator, AI triage service, compliance-mapping engine, and isolated worker sandbox services (containers/ephemeral VMs) that run scoped checks.
- AI layer: Deterministic prompt templates + models for natural-language triage, report drafting, and prioritization with human-in-the-loop gating for high-risk outputs.
- Security-first runtime: Least-privilege worker roles, ephemeral credentials, network segmentation for test runs, and conservative defaults to avoid disruption.
- Controls & mapping engine: Rule engine that links detection signatures to control IDs across CIS, NIST, ISO, OWASP, HIPAA, and GDPR; produces evidence artifacts for auditors.
- DevSecOps: Secure CI/CD, dependency scanning, static analysis, automated tests, and staged rollouts.
- Auditing & telemetry: Tamper-evident logs, RBAC, MFA for operators, and immutable evidence bundles for compliance reviewers.
Challenges we ran into
- Safety vs. coverage tradeoffs: Aggressive techniques surface more issues but risk disrupting production. We built multi-mode operation (safe / deep / human-mediated) and conservative defaults.
- High false-positive rates: Raw scanning created noise. We invested heavily in AI-driven deduplication, confidence scoring, and enrichment with contextual data to raise signal-to-noise.
- Compliance mapping complexity: Overlapping and evolving frameworks required a flexible control model and regular SME/legal reviews to keep mappings defensible.
- Sensitive data handling: Tests and logs can contain PII/PHI. We implemented automatic redaction, strict retention policies, and opt-in data usage modes.
- Adversarial inputs to AI layer: Prompt injection and malicious inputs required hardened input validation, sanitization, and human review gates.
Accomplishments that we're proud of
- Audit-ready reports: Findings mapped to control IDs (CIS/NIST/ISO) with reproduction steps, evidence, and prioritized remediation.
- Zero-impact staged trials: Responsible testing mode passed internal customer acceptance tests with no production disruption.
- PII/PHI safeguards: Automated redaction pipeline, retention controls, and support for HIPAA-capable deployments (BAAs, auditing).
- Integrations that close the loop: Out-of-the-box connectors to popular VM and ticketing systems to accelerate fix workflows.
- Triage efficiency gains: AI triage reduced analyst manual triage time significantly in pilots by grouping duplicates and surfacing high-confidence findings.
What we learned
- Governance is essential. Automation must be paired with clear policies, consent, and legal agreements to be safe for production use.
- Context matters more than raw alerts. Findings that include business impact and remediation steps gain more trust from security teams.
- Compliance is dynamic. Standards and regulations evolve; plan for continuous updates and legal review in the product lifecycle.
- Human-in-the-loop remains critical. For high-risk or novel findings, analyst review prevents harmful actions and reduces false positives.
- Sane defaults sell. Clear onboarding, defensible defaults, and simple policies increase adoption among conservative enterprise customers.
What's next for Exfurnace
- Enterprise compliance packs: Pre-built evidence collections and templates for SOC 2, HIPAA, PCI-DSS, and ISO 27001 audits.
- Continuous posture & drift detection: Safe scheduled scans, configuration-drift alerts, and SLA-based remediation tracking.
- Advanced adversarial AI testing: MITRE ATLAS–style scenarios and robustness checks for customers’ ML systems.
- Privacy-first enhancements: Stronger data isolation, enhanced redaction, and features for privacy-preserving analysis in sensitive environments.
- Marketplace & community checks: Vetted third-party modules with signing/validation pipelines to expand coverage safely.
- Audit automation: Auto-generated auditor exports and evidence bundles to minimize audit prep time.
Built With
- mcp
- next.js
- react

Log in or sign up for Devpost to join the conversation.