fix(ado): resolve az binary via shutil.which for Windows az.cmd (closes #1430)#1432
Merged
Merged
Conversation
#1430) On Windows the Azure CLI ships as az.cmd (a batch wrapper). Python's subprocess.run -> CreateProcessW does NOT honor PATHEXT for non-.exe executables, so a bare "az" token cannot be resolved and raises FileNotFoundError. shutil.which("az") DOES honor PATHEXT and finds az.cmd, so AzureCliBearerProvider.is_available() returned True while every actual subprocess call failed. The error then cascaded through the ADO --update preflight, killing Git Credential Manager along the way, and surfaced as the misleading 'az present but not logged in' diagnostic -- even when the user WAS logged in via az login. Layer 1 (root cause): resolve the az executable once in __init__ via shutil.which and store the absolute path. Use it in every subprocess call (_run_get_access_token, get_current_tenant_id). is_available() now mirrors the same construction-time resolution so it can no longer disagree with get_bearer_token()'s pre-check. Explicit absolute / separator-bearing paths from callers (tests) are trusted verbatim. Layer 2 (defense in depth): pipeline.py _preflight_auth_check now strips GIT_CONFIG_GLOBAL / GIT_CONFIG_NOSYSTEM / GIT_ASKPASS from the probe env for ADO hosts too (previously only generic). This lets Git Credential Manager answer for Entra-cached ADO credentials when az bearer acquisition is unavailable for any reason -- the #1430 PATH quirk, sandbox, proxy, etc. The actual clone path remains isolated (it builds its own clean env via GitAuthEnvBuilder.setup_environment), so the security rationale for empty global gitconfig during clone is preserved. Only the single ls-remote probe leg gains GCM access. Tests: - New TestWindowsAzCmdResolution class with the exact regression shape: shutil.which returns a Windows az.cmd path; verify subprocess receives it as argv[0] for both get_bearer_token and tenant probe. - New is_available stability test (resolution is one-shot). - Init absolute-path bypass test (preserves test injection contract). - Existing test_ado_host_retains_credential_blocking_env renamed and inverted to assert the new contract: ADO preflight no longer kills GCM. Existing ADO bearer-fallback E2E unchanged (the fake-git path has no GCM installed so behavior is unaffected). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three nits surfaced by auth-expert + supply-chain-security panel: 1. Tighten the explicit-path bypass in AzureCliBearerProvider.__init__ from 'isabs OR contains os.sep' to 'isabs only'. The os.sep heuristic accepted relative-with-separator tokens like 'subdir/az' verbatim, handing a CWD-relative path to subprocess.run without going through shutil.which. Only absolute paths are now trusted verbatim; everything else flows through resolution. New test test_init_relative_with_separator_still_resolves locks the contract. 2. Add an explicit early-return in get_current_tenant_id when _az_command is None. Previously this case relied on the broad 'except Exception' to swallow a TypeError from passing None as argv[0]. The new guard mirrors get_bearer_token's is_available() pre-check. New test test_get_current_tenant_id_returns_none_without_subprocess_when_az_missing asserts subprocess.run is never invoked in that path. 3. Add an explicit FileNotFoundError regression-trap test (test_bare_az_would_raise_filenotfound_but_resolved_path_succeeds) that simulates the exact Windows CreateProcessW behaviour: argv[0] == 'az' raises FileNotFoundError, argv[0] == resolved .cmd path succeeds. Pins both halves of the #1430 contract in a single test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a Windows-specific Azure DevOps authentication failure during apm update by making Azure CLI invocation deterministic (resolving az via shutil.which so az.cmd works) and by relaxing the ADO preflight probe environment to preserve Git Credential Manager as a fallback channel.
Changes:
- Resolve and store the
azexecutable path atAzureCliBearerProviderconstruction time and reuse it for all subprocess calls. - Expand the preflight probe environment carve-out to include Azure DevOps hosts so credential helpers (notably GCM) can participate in the
ls-remoteprobe. - Add/adjust unit tests covering Windows
az.cmdresolution behavior and the updated preflight environment expectations; add a changelog entry.
Show a summary per file
| File | Description |
|---|---|
src/apm_cli/core/azure_cli.py |
Resolves az once at construction and uses the resolved path for subprocess calls; makes is_available() state-based and adds a guard in get_current_tenant_id(). |
src/apm_cli/install/pipeline.py |
Widens the preflight env carve-out to also apply to Azure DevOps hosts to preserve credential helper behavior for the probe. |
tests/unit/test_azure_cli.py |
Adds regression tests for Windows az.cmd resolution and stability of is_available() after construction. |
tests/unit/install/test_pipeline_auth_preflight.py |
Updates/renames the ADO preflight env test to assert credential-blocking vars are omitted. |
CHANGELOG.md |
Adds an Unreleased Fixed entry documenting the behavior change. |
Copilot's findings
- Files reviewed: 5/5 changed files
- Comments generated: 3
|
|
||
| ### Fixed | ||
|
|
||
| - `apm update` against private Azure DevOps deps no longer fails on Windows with a misleading "az present but not logged in" diagnostic when the user IS signed in via `az login`. Root cause: Python's `subprocess.run(["az", ...])` -> `CreateProcessW` does not honor `PATHEXT` for non-`.exe` executables, so the Windows `az.cmd` wrapper could not be invoked even though `shutil.which("az")` resolved it. `AzureCliBearerProvider` now resolves the `az` binary via `shutil.which` once at construction and passes the absolute path to every subprocess call. As a defense-in-depth measure, the ADO `--update` preflight probe no longer strips `GIT_CONFIG_GLOBAL` / `GIT_CONFIG_NOSYSTEM` / `GIT_ASKPASS`, so Git Credential Manager can answer for Entra-cached ADO credentials whenever bearer acquisition is unavailable for any reason (sandbox, proxy, future PATH quirks). The actual clone path keeps its full gitconfig isolation. (#1430) |
Comment on lines
114
to
+124
| def is_available(self) -> bool: | ||
| """Return True iff the ``az`` binary is on PATH. | ||
| """Return True iff the ``az`` binary was found on PATH at construction. | ||
|
|
||
| Resolution is one-shot (performed in :meth:`__init__`). PATH changes | ||
| made after the provider was constructed will NOT be observed; restart | ||
| the CLI if you have installed ``az`` mid-session. | ||
|
|
||
| Does NOT check whether the user is logged in -- that requires a | ||
| subprocess call and is deferred to :meth:`get_bearer_token`. | ||
| """ | ||
| return shutil.which(self._az_command) is not None | ||
| return self._az_command is not None |
Comment on lines
+160
to
176
| # GIT_CONFIG_GLOBAL / GIT_CONFIG_NOSYSTEM carve-out: GitAuthEnvBuilder | ||
| # forces an empty global gitconfig for ALL hosts to prevent a user's | ||
| # ~/.gitconfig insteadOf rewrites or credential helpers from leaking | ||
| # tokens during a clone. But for preflight probes (a single ls-remote | ||
| # against the same host the dep targets), the redirection surface is | ||
| # nil and killing the user's global config kills Git Credential | ||
| # Manager along with it -- the helper most Windows ADO users rely on | ||
| # for Entra-cached credentials. For ADO specifically that matters | ||
| # because bearer acquisition can fail for reasons unrelated to login | ||
| # state (sandbox, proxy, microsoft/apm#1430-style PATH quirks), and | ||
| # GCM is the only remaining channel that can save us. Generic hosts | ||
| # have the same logic; widening the carve-out to ADO keeps the | ||
| # actual clone path isolated (it builds its own clean env) while | ||
| # giving the preflight probe the best chance to succeed. | ||
| if is_generic or is_azure_devops_hostname(host): | ||
| for _key in ("GIT_CONFIG_GLOBAL", "GIT_CONFIG_NOSYSTEM", "GIT_ASKPASS"): | ||
| probe_env.pop(_key, None) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
apm updateagainst private Azure DevOps deps fails on Windows withAuthentication failed for dev.azure.comand the diagnostic "az present but not logged in" — even when the user IS signed in viaaz login. The root cause is thatsubprocess.run(["az", ...])callsCreateProcessW, which does not honorPATHEXTfor non-.exeexecutables, so the Windowsaz.cmdwrapper cannot be invoked despiteshutil.which("az")resolving it. This PR resolvesazviashutil.whichonce at construction and widens the ADO preflight carve-out so Git Credential Manager survives as a fallback. Closes #1430.Problem (WHY)
Important
The user is shown a remediation message instructing them to run
az login— the exact action they have already taken. Worst-possible-failure-mode for diagnostics.The bug surfaces only on
apm update, notapm install, because the preflight is gated onupdate_refs. The Windows-only cascade:AzureCliBearerProvider.is_available()returnsTrue(shutil.whichhonorsPATHEXT).subprocess.run([self._az_command, ...])raisesFileNotFoundError(CreateProcessWdoes not honorPATHEXT)._run_get_access_tokenwraps theOSErrorasAzureCliBearerError(kind="subprocess_error")._resolve_tokenincore/auth.pyswallows it ->(None, "none", "basic")._preflight_auth_checkbuilds a probe env that stripsGIT_CONFIG_GLOBAL/GIT_CONFIG_NOSYSTEM, killing Git Credential Manager — the only remaining channel that could have rescued the probe with an Entra-cached credential.git ls-remotereturns 401.build_error_contextre-hits the same broken subprocess inget_current_tenant_id(), swallows the failure, and renders Case 3: "az present but not logged in".PROSE anchor: "Grounding outputs in deterministic tool execution transforms probabilistic generation into verifiable action." The fix replaces a probabilistic PATH lookup (left to
CreateProcessW) with a deterministic resolution (shutil.which).Approach (WHAT)
Two layers, both required:
src/apm_cli/core/azure_cli.pyazviashutil.whichonce in__init__; pass the absolute path to everysubprocess.run.src/apm_cli/install/pipeline.py~/.gitconfigavailable, preserving Git Credential Manager as a fallback channel.Implementation (HOW)
src/apm_cli/core/azure_cli.py—__init__resolvesaz_commandviashutil.whichonce and stores either the absolute path orNone. Absolute paths supplied by callers (test injection) are trusted verbatim; everything else flows throughshutil.whichso a CWD-relative token likesubdir/azcannot bypass resolution.is_available()becomes state-only (no per-callshutil.which), so it can no longer disagree withget_bearer_token()'s pre-check.get_current_tenant_id()gains an explicit early-return when_az_command is Noneto avoid passingNoneas argv[0] (previously masked by a broadexcept).src/apm_cli/install/pipeline.py— the carve-out at line 160 changes fromif is_generic:toif is_generic or is_azure_devops_hostname(host):._dl.git_env(built byGitAuthEnvBuilder.setup_environment) still forces an empty global gitconfig for the actual clone path; only the singlels-remoteprobe leg gains GCM access. The bearer-fallback_bearer_opbuilds its own clean env via_build_git_env(scheme="bearer")and is unaffected.Diagrams
Cascade before and after the fix — every step is a real call site in the diff.
flowchart LR subgraph Before["Before (Windows, az.cmd installed, user logged in)"] A1["is_available()<br/>shutil.which → True"] --> B1["subprocess.run(['az', ...])<br/>CreateProcessW: FileNotFoundError"] B1 --> C1["AzureCliBearerError<br/>kind=subprocess_error"] C1 --> D1["_resolve_token<br/>→ (None, 'none', 'basic')"] D1 --> E1["preflight strips<br/>GIT_CONFIG_GLOBAL"] E1 --> F1["GCM disabled<br/>ls-remote 401"] F1 --> G1["Case 3:<br/>'az present, not logged in'"] end subgraph After["After"] A2["__init__<br/>shutil.which → C:\\...\\az.cmd"] --> B2["subprocess.run(['C:\\...\\az.cmd', ...])<br/>CreateProcessW OK"] B2 --> C2["bearer token<br/>preflight passes"] end style G1 stroke:#d00,stroke-width:2px style C2 stroke:#070,stroke-width:2pxTrade-offs
is_available()is now construction-time only. PATH changes mid-process won't be observed; restart the CLI if you installazafter APM is running. Acceptable: tokens are cached anyway and the provider is a process-wide singleton.~/.gitconfig. A user'sinsteadOfrewrite could redirect thels-remoteprobe to a different host. Mitigations: the probe is read-only; the actual clone path retains full gitconfig isolation; "attacker controls~/.gitconfig" implies the attacker already owns the git client. Generic hosts already had this carve-out ([BUG] apm install --update fails for GHES/generic hosts due to pipeline probe blocking credential helpers #1082); the security argument applies equally to ADO.Benefits
az loginpath canapm updateagain.is_available()andget_bearer_token()now agree.Validation
Note
CI lint mirror green; affected suites all pass locally.
Scenario evidence
az.cmdresolves to subprocess argv[0]tests/unit/test_azure_cli.py::TestWindowsAzCmdResolution::test_get_bearer_token_invokes_resolved_az_cmd_pathFileNotFoundErrorshape no longer surfaces assubprocess_errortests/unit/test_azure_cli.py::TestWindowsAzCmdResolution::test_bare_az_would_raise_filenotfound_but_resolved_path_succeedsis_available()is consistent withget_bearer_token()pre-checktests/unit/test_azure_cli.py::TestIsAvailable::test_is_available_stable_after_constructiontests/unit/install/test_pipeline_auth_preflight.py::TestPreflightGenericHostAllowsCredentialHelpers::test_ado_host_omits_credential_blocking_envtests/integration/test_ado_preflight_bearer_fallback_e2e.py(unchanged, passes)get_current_tenant_idshort-circuits cleanly whenazmissingtests/unit/test_azure_cli.py::TestWindowsAzCmdResolution::test_get_current_tenant_id_returns_none_without_subprocess_when_az_missingHow to test
az.cmdinstalled andaz logincompleted: clone a repo with a private ADO dep inapm.yml, runapm update. Pre-fix: fails with "az present but not logged in". Post-fix: succeeds.azinstalled but with GCM configured fordev.azure.com: runapm update. Post-fix: GCM resolves the credential and the probe passes.uv run --extra dev pytest tests/unit/test_azure_cli.py tests/unit/install/test_pipeline_auth_preflight.py tests/integration/test_ado_preflight_bearer_fallback_e2e.py.uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/.Closes #1430.
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com