Exfurnace | Devpost

Inspiration

Exfurnace started from a simple premise: traditional penetration tests are valuable but episodic, expensive, and often leave long windows of untested exposure. We saw an opportunity to use AI to automate routine offensive tasks — reconnaissance, correlation, and triage — so human red teams can focus on high-risk, creative engagements. The goal: continuous, responsible validation of controls that produces audit-ready evidence and shortens time-to-remediate.

What it does

Safe automated offense (scoped): Orchestrates non-destructive, authorized checks that simulate attacker behaviors to reveal misconfigurations, weak authentication, exposed services, and risky cloud posture.
AI-assisted triage & reporting: Deduplicates findings, scores risk, and generates clear, prioritized remediation guidance and human-friendly summaries.
Compliance mapping: Maps findings to common frameworks (CIS, NIST 800-53, ISO 27001, OWASP, HIPAA/GDPR considerations) and generates exportable evidence bundles for audits.
Responsible-use guardrails: Enforces scope checks, rate limits, credential handling rules, consent verification, and immutable auditing to ensure tests run only where authorized.
Extensible integrations: Plugins for SIEM, ticketing, vulnerability management, and enterprise onboarding.

Ethical note: Exfurnace requires explicit written authorization (scope and Rules of Engagement) before any offensive activity. It is intended for authorized, defensive security use only.

How we built it

Modular architecture: Controller/orchestrator, AI triage service, compliance-mapping engine, and isolated worker sandbox services (containers/ephemeral VMs) that run scoped checks.
AI layer: Deterministic prompt templates + models for natural-language triage, report drafting, and prioritization with human-in-the-loop gating for high-risk outputs.
Security-first runtime: Least-privilege worker roles, ephemeral credentials, network segmentation for test runs, and conservative defaults to avoid disruption.
Controls & mapping engine: Rule engine that links detection signatures to control IDs across CIS, NIST, ISO, OWASP, HIPAA, and GDPR; produces evidence artifacts for auditors.
DevSecOps: Secure CI/CD, dependency scanning, static analysis, automated tests, and staged rollouts.
Auditing & telemetry: Tamper-evident logs, RBAC, MFA for operators, and immutable evidence bundles for compliance reviewers.

Challenges we ran into

Safety vs. coverage tradeoffs: Aggressive techniques surface more issues but risk disrupting production. We built multi-mode operation (safe / deep / human-mediated) and conservative defaults.
High false-positive rates: Raw scanning created noise. We invested heavily in AI-driven deduplication, confidence scoring, and enrichment with contextual data to raise signal-to-noise.
Compliance mapping complexity: Overlapping and evolving frameworks required a flexible control model and regular SME/legal reviews to keep mappings defensible.
Sensitive data handling: Tests and logs can contain PII/PHI. We implemented automatic redaction, strict retention policies, and opt-in data usage modes.
Adversarial inputs to AI layer: Prompt injection and malicious inputs required hardened input validation, sanitization, and human review gates.

Accomplishments that we're proud of

Audit-ready reports: Findings mapped to control IDs (CIS/NIST/ISO) with reproduction steps, evidence, and prioritized remediation.
Zero-impact staged trials: Responsible testing mode passed internal customer acceptance tests with no production disruption.
PII/PHI safeguards: Automated redaction pipeline, retention controls, and support for HIPAA-capable deployments (BAAs, auditing).
Integrations that close the loop: Out-of-the-box connectors to popular VM and ticketing systems to accelerate fix workflows.
Triage efficiency gains: AI triage reduced analyst manual triage time significantly in pilots by grouping duplicates and surfacing high-confidence findings.

What we learned

Governance is essential. Automation must be paired with clear policies, consent, and legal agreements to be safe for production use.
Context matters more than raw alerts. Findings that include business impact and remediation steps gain more trust from security teams.
Compliance is dynamic. Standards and regulations evolve; plan for continuous updates and legal review in the product lifecycle.
Human-in-the-loop remains critical. For high-risk or novel findings, analyst review prevents harmful actions and reduces false positives.
Sane defaults sell. Clear onboarding, defensible defaults, and simple policies increase adoption among conservative enterprise customers.

What's next for Exfurnace

Enterprise compliance packs: Pre-built evidence collections and templates for SOC 2, HIPAA, PCI-DSS, and ISO 27001 audits.
Continuous posture & drift detection: Safe scheduled scans, configuration-drift alerts, and SLA-based remediation tracking.
Advanced adversarial AI testing: MITRE ATLAS–style scenarios and robustness checks for customers’ ML systems.
Privacy-first enhancements: Stronger data isolation, enhanced redaction, and features for privacy-preserving analysis in sensitive environments.
Marketplace & community checks: Vetted third-party modules with signing/validation pipelines to expand coverage safely.
Audit automation: Auto-generated auditor exports and evidence bundles to minimize audit prep time.