feature/env vars filter #11

Dumbris · 2025-07-08T03:30:05Z

Implement secure environment variable management
Enhance timeout management and logging in client and server components

- Added `secureenv` package for filtering environment variables based on configuration. - Introduced `EnvConfig` struct to manage allowed system variables and custom variables. - Updated `Config` struct to include environment configuration. - Enhanced `DefaultConfig` to initialize secure environment settings. - Integrated secure environment manager into upstream and client components. - Added tests for secure environment functionality and integration scenarios.

- Updated timeout duration for tool listing operations to 30 seconds for better handling of Docker container startup and API calls. - Improved logging for various operations, including connection errors and tool calls, to provide more detailed insights during debugging. - Added conditional logging for JSON request/response data based on debug level, enhancing traceability without cluttering logs.

- Removed unnecessary blank lines in config validation and test functions. - Simplified environment variable splitting logic in tests. - Introduced constants for OS checks to improve readability and maintainability. - Updated autostart functionality checks to use constants for OS types. - Enhanced error handling in file operations for better robustness.

* Implement secure environment variable management - Added `secureenv` package for filtering environment variables based on configuration. - Introduced `EnvConfig` struct to manage allowed system variables and custom variables. - Updated `Config` struct to include environment configuration. - Enhanced `DefaultConfig` to initialize secure environment settings. - Integrated secure environment manager into upstream and client components. - Added tests for secure environment functionality and integration scenarios. * Enhance timeout management and logging in client and server components - Updated timeout duration for tool listing operations to 30 seconds for better handling of Docker container startup and API calls. - Improved logging for various operations, including connection errors and tool calls, to provide more detailed insights during debugging. - Added conditional logging for JSON request/response data based on debug level, enhancing traceability without cluttering logs. * Refactor code for clarity and consistency - Removed unnecessary blank lines in config validation and test functions. - Simplified environment variable splitting logic in tests. - Introduced constants for OS checks to improve readability and maintainability. - Updated autostart functionality checks to use constants for OS types. - Enhanced error handling in file operations for better robustness.

CRITICAL: Identified most dangerous gap in Docker recovery implementation. Problem: Supervisor can spawn multiple containers for same server when failing to detect container liveness, leading to: - Resource exhaustion (multiple containers per server) - Port binding conflicts - Orphaned containers accumulating over time - Data corruption if containers share state Root Causes: 1. Race condition in ForceReconnectAll() - no serialization 2. Lost container IDs when cidfile read times out (10s) 3. Random container names don't prevent duplicates 4. No pre-creation check for existing containers 5. Supervisor liveness failures (slow startup, network partition, API errors) Comprehensive 5-Layer Solution: - Layer 1: Idempotent container creation (pre-cleanup) - Layer 2: Container labels for ownership tracking - Layer 3: Distributed locks to prevent races - Layer 4: Enhanced container health verification - Layer 5: Graceful degradation on cidfile timeout Impact: Prevents resource exhaustion and ensures exactly 1 container per server at all times. Effort: +12 hours (40 total vs 28-32 original)

Implements Issues #1, #2, #4, and #11 (Layers 1, 3, 4) from docker-recovery-improvements.md **Issue #11: Duplicate Container Spawning Prevention** - Layer 1 (Idempotent Creation): ensureNoExistingContainers() cleans up all existing containers before creating new ones - Layer 3 (Distributed Locks): Per-server mutex prevents concurrent container creation races - Layer 4 (Enhanced Health Verification): verifyContainerHealthy() checks Docker container state AND responsiveness **Issue #1: Docker-only Filtering** - ForceReconnectAll() now filters to only reconnect Docker-based servers - HTTP/SSE/stdio servers are skipped, preventing unnecessary reconnections **Issue #2: Container Health Verification** - verifyContainerHealthy() detects frozen containers when Docker paused - Uses docker inspect to verify container running + responsive - Prevents skipping reconnection for zombie containers **Issue #4: Add Recovering State** - New StateCoreRecoveringDocker state provides UX feedback during recovery - EventDockerRecovered triggers transition from error → recovering → launching - Tray menu shows "Docker engine recovered - reconnecting servers..." **Files Modified:** - internal/upstream/core/docker.go: ensureNoExistingContainers(), GetContainerID() - internal/upstream/core/connection.go: container lock acquisition, pre-cleanup - internal/upstream/core/container_lock.go: NEW - distributed lock implementation - internal/upstream/core/client.go: GetContainerID() accessor - internal/upstream/manager.go: verifyContainerHealthy(), Docker-only filtering - internal/upstream/managed/client.go: GetContainerID() delegation - cmd/mcpproxy-tray/internal/state/states.go: StateCoreRecoveringDocker, EventDockerRecovered - cmd/mcpproxy-tray/main.go: state mapping, handleDockerRecovering() - internal/tray/connection_state.go: ConnectionStateRecoveringDocker **Testing:** - All upstream package tests passing (./internal/upstream/...) - Code compiles cleanly (go build ./...) - State machine transitions validated **Impact:** - Prevents duplicate container spawning during concurrent reconnections - Eliminates resource exhaustion from orphaned containers - Better UX - users see recovery in progress instead of errors - Only Docker servers affected by Docker recovery (not HTTP/SSE)

Implements Issues #3, #9, and #11 (Layers 2, 5) from docker-recovery-improvements.md **Issue #11 Layer 2: Container Labels for Ownership Tracking** - Add Docker labels to all containers (instance ID, server name, PID) - formatContainerLabels() generates ownership labels automatically - Labels: com.mcpproxy.managed, com.mcpproxy.instance, com.mcpproxy.server - Enables accurate cleanup of orphaned containers from crashed instances - New file: internal/upstream/core/instance.go for instance ID management **Issue #11 Layer 5: cidfile Timeout Fallback** - When cidfile read times out (slow image pull), fall back to name lookup - Use 'docker ps --filter name=...' to recover container ID - Prevents orphaned containers when cidfile fails - Graceful degradation maintains container cleanup capability **Issue #3: Exponential Backoff Polling** - Replace fixed 5s Docker polling with exponential backoff - Intervals: 2s → 5s → 10s → 30s → 60s (max) - Fast recovery when Docker quickly resumes (2s vs 5s) - Lower CPU/battery usage when Docker off for extended periods - Logs total wait time and attempt count for observability **Issue #9: Partial Failure Handling** - ForceReconnectAll() now returns detailed ReconnectResult struct - Tracks: total servers, attempted, successful, failed, skipped - Per-server skip reasons ("container healthy", "not Docker", "disabled") - Per-server failure errors for debugging - Runtime logs comprehensive summary of reconnection results **Files Modified:** - internal/upstream/core/instance.go: NEW - instance ID persistence - internal/upstream/core/isolation.go: add container labels - internal/upstream/core/docker.go: cidfile timeout fallback - internal/upstream/manager.go: ReconnectResult struct, detailed tracking - internal/runtime/lifecycle.go: log reconnection results - cmd/mcpproxy-tray/main.go: exponential backoff polling, min() helper **Testing:** - All upstream tests passing - Code compiles cleanly - Exponential backoff logic validated **Impact:** - Container labels enable cleanup of orphaned containers across restarts - cidfile fallback prevents container orphaning on slow pulls - Exponential backoff saves resources while maintaining fast recovery - Partial failure tracking improves debugging and observability

Dumbris added 3 commits July 7, 2025 20:36

Dumbris merged commit 12a7673 into main Jul 8, 2025
31 checks passed

Dumbris deleted the feature/env-vars-filter branch July 14, 2025 08:53

Dumbris mentioned this pull request Nov 2, 2025

docs: Docker recovery critical analysis and improvement roadmap #120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature/env vars filter #11

feature/env vars filter #11

Uh oh!

Dumbris commented Jul 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feature/env vars filter #11

feature/env vars filter #11

Uh oh!

Conversation

Dumbris commented Jul 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant