Inspiration

Exfurnace started from a simple premise: traditional penetration tests are valuable but episodic, expensive, and often leave long windows of untested exposure. We saw an opportunity to use AI to automate routine offensive tasks — reconnaissance, correlation, and triage — so human red teams can focus on high-risk, creative engagements. The goal: continuous, responsible validation of controls that produces audit-ready evidence and shortens time-to-remediate.

What it does

  • Safe automated offense (scoped): Orchestrates non-destructive, authorized checks that simulate attacker behaviors to reveal misconfigurations, weak authentication, exposed services, and risky cloud posture.
  • AI-assisted triage & reporting: Deduplicates findings, scores risk, and generates clear, prioritized remediation guidance and human-friendly summaries.
  • Compliance mapping: Maps findings to common frameworks (CIS, NIST 800-53, ISO 27001, OWASP, HIPAA/GDPR considerations) and generates exportable evidence bundles for audits.
  • Responsible-use guardrails: Enforces scope checks, rate limits, credential handling rules, consent verification, and immutable auditing to ensure tests run only where authorized.
  • Extensible integrations: Plugins for SIEM, ticketing, vulnerability management, and enterprise onboarding.

Ethical note: Exfurnace requires explicit written authorization (scope and Rules of Engagement) before any offensive activity. It is intended for authorized, defensive security use only.

How we built it

  • Modular architecture: Controller/orchestrator, AI triage service, compliance-mapping engine, and isolated worker sandbox services (containers/ephemeral VMs) that run scoped checks.
  • AI layer: Deterministic prompt templates + models for natural-language triage, report drafting, and prioritization with human-in-the-loop gating for high-risk outputs.
  • Security-first runtime: Least-privilege worker roles, ephemeral credentials, network segmentation for test runs, and conservative defaults to avoid disruption.
  • Controls & mapping engine: Rule engine that links detection signatures to control IDs across CIS, NIST, ISO, OWASP, HIPAA, and GDPR; produces evidence artifacts for auditors.
  • DevSecOps: Secure CI/CD, dependency scanning, static analysis, automated tests, and staged rollouts.
  • Auditing & telemetry: Tamper-evident logs, RBAC, MFA for operators, and immutable evidence bundles for compliance reviewers.

Challenges we ran into

  • Safety vs. coverage tradeoffs: Aggressive techniques surface more issues but risk disrupting production. We built multi-mode operation (safe / deep / human-mediated) and conservative defaults.
  • High false-positive rates: Raw scanning created noise. We invested heavily in AI-driven deduplication, confidence scoring, and enrichment with contextual data to raise signal-to-noise.
  • Compliance mapping complexity: Overlapping and evolving frameworks required a flexible control model and regular SME/legal reviews to keep mappings defensible.
  • Sensitive data handling: Tests and logs can contain PII/PHI. We implemented automatic redaction, strict retention policies, and opt-in data usage modes.
  • Adversarial inputs to AI layer: Prompt injection and malicious inputs required hardened input validation, sanitization, and human review gates.

Accomplishments that we're proud of

  • Audit-ready reports: Findings mapped to control IDs (CIS/NIST/ISO) with reproduction steps, evidence, and prioritized remediation.
  • Zero-impact staged trials: Responsible testing mode passed internal customer acceptance tests with no production disruption.
  • PII/PHI safeguards: Automated redaction pipeline, retention controls, and support for HIPAA-capable deployments (BAAs, auditing).
  • Integrations that close the loop: Out-of-the-box connectors to popular VM and ticketing systems to accelerate fix workflows.
  • Triage efficiency gains: AI triage reduced analyst manual triage time significantly in pilots by grouping duplicates and surfacing high-confidence findings.

What we learned

  • Governance is essential. Automation must be paired with clear policies, consent, and legal agreements to be safe for production use.
  • Context matters more than raw alerts. Findings that include business impact and remediation steps gain more trust from security teams.
  • Compliance is dynamic. Standards and regulations evolve; plan for continuous updates and legal review in the product lifecycle.
  • Human-in-the-loop remains critical. For high-risk or novel findings, analyst review prevents harmful actions and reduces false positives.
  • Sane defaults sell. Clear onboarding, defensible defaults, and simple policies increase adoption among conservative enterprise customers.

What's next for Exfurnace

  • Enterprise compliance packs: Pre-built evidence collections and templates for SOC 2, HIPAA, PCI-DSS, and ISO 27001 audits.
  • Continuous posture & drift detection: Safe scheduled scans, configuration-drift alerts, and SLA-based remediation tracking.
  • Advanced adversarial AI testing: MITRE ATLAS–style scenarios and robustness checks for customers’ ML systems.
  • Privacy-first enhancements: Stronger data isolation, enhanced redaction, and features for privacy-preserving analysis in sensitive environments.
  • Marketplace & community checks: Vetted third-party modules with signing/validation pipelines to expand coverage safely.
  • Audit automation: Auto-generated auditor exports and evidence bundles to minimize audit prep time.

Built With

Share this project:

Updates