fix: flaky prompt termination on reader close test #4957

Benehiko · 2024-03-21T09:55:48Z

- What I did
Increase the context timeout from 100ms to 500ms. Removed the manual close on the result channels to prevent panics on test failures.

- How I did it

- How to verify it

$ go test -v -run=TestPromptForConfirmation/case=reader_closed -count=1000 ./cli/command/utils_test.go

=== RUN   TestPromptForConfirmation/case=reader_closed
--- PASS: TestPromptForConfirmation (0.00s)
    --- PASS: TestPromptForConfirmation/case=reader_closed (0.00s)
PASS
ok  	command-line-arguments	0.263s

- Description for the changelog

Fix TestPromptForConfirmation flakiness

- A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Alano Terblanche <[email protected]>

codecov-commenter · 2024-03-21T09:57:55Z

Codecov Report

Merging #4957 (7ea10d5) into master (2ae903e) will decrease coverage by 0.27%.
Report is 16 commits behind head on master.
The diff coverage is n/a.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4957      +/-   ##
==========================================
- Coverage   61.44%   61.18%   -0.27%     
==========================================
  Files         289      294       +5     
  Lines       20241    20538     +297     
==========================================
+ Hits        12437    12566     +129     
- Misses       6903     7077     +174     
+ Partials      901      895       -6

krissetto

LGTM.

Generally speaking, could you reproduce the exact error locally by making this timeout really small?

Benehiko · 2024-03-21T11:16:06Z

LGTM.

Generally speaking, could you reproduce the exact error locally by making this timeout really small?

Yes, but it had to be really really small.

resultCtx, resultCancel := context.WithTimeout(ctx, 1*time.Nanosecond)

    utils_test.go:222: PromptForConfirmation did not return after promptReader was closed
--- FAIL: TestPromptForConfirmation (0.00s)
    --- FAIL: TestPromptForConfirmation/case=reader_closed (0.00s)
=== RUN   TestPromptForConfirmation
panic: send on closed channel

goroutine 36 [running]:
command-line-arguments_test.TestPromptForConfirmation.func8.2()
	/home/benehiko/Github/docker-cli/cli/command/utils_test.go:209 +0x6d
created by command-line-arguments_test.TestPromptForConfirmation.func8 in goroutine 35
	/home/benehiko/Github/docker-cli/cli/command/utils_test.go:207 +0x32a
FAIL	command-line-arguments	0.011s
FAIL

This was my laptops flake level:

resultCtx, resultCancel := context.WithTimeout(ctx, 100000*time.Nanosecond)

laurazard · 2024-03-21T13:35:52Z

Changing the timeout length causing a panic is a code smell – and simply not closing the channel "fixes" it, but isn't really correct. The issue here is we're launching a goroutine in https://github.com/Benehiko/docker-cli/blob/d2ea5adfe401205d39050abe117cd1cb6811764b/cli/command/utils_test.go#L205-L208 without a way to signal it to end/that we don't care about it's result anymore.

Benehiko · 2024-03-21T13:52:20Z

Changing the timeout length causing a panic is a code smell – and simply not closing the channel "fixes" it, but isn't really correct. The issue here is we're launching a goroutine in https://github.com/Benehiko/docker-cli/blob/d2ea5adfe401205d39050abe117cd1cb6811764b/cli/command/utils_test.go#L205-L208 without a way to signal it to end/that we don't care about it's result anymore.

The panic is not caused by a too short timeout - it's just a side-effect of it. Closing the channel in this function was actually a mistake since a test error would always throw a panic, which is not what we want. Omitting the lines closing the channel will keep the channel open for as long as the goroutine is active and will be cleaned up by the garbage collector.
Eventually the goroutine will be terminated due to context cancellation.

Please see #4948 (comment)

laurazard · 2024-03-25T15:17:10Z

The panic is not caused by a too short timeout - it's just a side-effect of it.

Gotcha :')

I still think we can do a bit better – instead of calling functions that know how to verify if a test has passed (and if they've timed out), we can have test cases tell us what their expected results are and check it ourselves within the main test flow. For example:

	for _, tc := range []struct {
		desc           string
		f              func() error
		expectedResult promptResult
	}{
		{
			"SIGINT", func() error {
				syscall.Kill(syscall.Getpid(), syscall.SIGINT)
				return nil
			},
			promptResult{
				result: false,
				err:    command.ErrPromptTerminated,
			},
		},
		{
			"no", func() error {
				_, err := fmt.Fprint(promptWriter, "n\n")
				return err
			},
			promptResult{
				result: false,
			},
		},
		{
			"yes", func() error {
				_, err := fmt.Fprint(promptWriter, "y\n")
				return err
			},
			promptResult{
				result: true,
			},
		},
		{
			"any", func() error {
				_, err := fmt.Fprint(promptWriter, "a\n")
				return err
			},
			promptResult{
				result: false,
			},
		},
		{
			"with space", func() error {
				_, err := fmt.Fprint(promptWriter, " y\n")
				return err
			},
			promptResult{
				result: true,
			},
		},
		{
			"reader closed", func() error {
				return promptReader.Close()
			},
			promptResult{
				result: false,
			},
		},
	} {
		t.Run("case="+tc.desc, func(t *testing.T) {
			buf.Reset()
			promptReader, promptWriter = io.Pipe()

			wroteHook := make(chan struct{}, 1)
			promptOut := test.NewWriterWithHook(bufioWriter, func(_ []byte) {
				wroteHook <- struct{}{}
			})

			result := make(chan promptResult, 1)
			go func() {
				r, err := command.PromptForConfirmation(ctx, promptReader, promptOut, "")
				result <- promptResult{r, err}
			}()

			select {
			case <-time.After(100 * time.Millisecond):
			case <-wroteHook:
			}

			drainChannel(ctx, wroteHook)

			assert.NilError(t, bufioWriter.Flush())
			assert.Equal(t, strings.TrimSpace(buf.String()), "Are you sure you want to proceed? [y/N]")

			assert.NilError(t, tc.f())

			select {
			case r := <-result:
				assert.Equal(t, r, tc.expectedResult)
			case <-time.After(500 * time.Millisecond):
				t.Fatal("test timed out - " + tc.desc)
			}
		})
	}
}

func drainChannel(ctx context.Context, ch <-chan struct{}) {
	go func() {
		for {
			select {
			case <-ctx.Done():
				return
			case <-ch:
			}
		}
	}()
}

I think this is easier to follow/less prone to errors than the other way around. As a bonus, we get to delete pollForPromptOutput.

Contexts are great, but if we're just trying to time things out, a simple select with the result channel and a time.After([duration]) is really clear.

Signed-off-by: Alano Terblanche <[email protected]>

Benehiko · 2024-03-26T12:48:38Z

@laurazard could you take another look? I've refactored the test according to your suggestion

laurazard · 2024-03-26T13:02:28Z

Thanks!

laurazard

LGTM

fix: flaky prompt termination on reader close test

d2ea5ad

Signed-off-by: Alano Terblanche <[email protected]>

Benehiko self-assigned this Mar 21, 2024

Benehiko added area/testing kind/bugfix PR's that fix bugs labels Mar 21, 2024

Benehiko requested a review from thaJeztah March 21, 2024 10:25

thaJeztah requested a review from laurazard March 21, 2024 11:06

krissetto approved these changes Mar 21, 2024

View reviewed changes

refactor: prompt tests

7ea10d5

Signed-off-by: Alano Terblanche <[email protected]>

laurazard approved these changes Mar 26, 2024

View reviewed changes

laurazard merged commit b8d5454 into docker:master Mar 26, 2024

Benehiko deleted the prompt-test-flakiness branch March 26, 2024 13:11

thaJeztah added this to the 26.1.0 milestone Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: flaky prompt termination on reader close test #4957

fix: flaky prompt termination on reader close test #4957

Uh oh!

Benehiko commented Mar 21, 2024 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 21, 2024 •

edited

Loading

Uh oh!

krissetto left a comment

Uh oh!

Benehiko commented Mar 21, 2024 •

edited

Loading

Uh oh!

laurazard commented Mar 21, 2024

Uh oh!

Benehiko commented Mar 21, 2024

Uh oh!

laurazard commented Mar 25, 2024 •

edited

Loading

Uh oh!

Benehiko commented Mar 26, 2024

Uh oh!

laurazard commented Mar 26, 2024

Uh oh!

laurazard left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: flaky prompt termination on reader close test #4957

fix: flaky prompt termination on reader close test #4957

Uh oh!

Conversation

Benehiko commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

krissetto left a comment

Choose a reason for hiding this comment

Uh oh!

Benehiko commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurazard commented Mar 21, 2024

Uh oh!

Benehiko commented Mar 21, 2024

Uh oh!

laurazard commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Benehiko commented Mar 26, 2024

Uh oh!

laurazard commented Mar 26, 2024

Uh oh!

laurazard left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Benehiko commented Mar 21, 2024 •

edited

Loading

codecov-commenter commented Mar 21, 2024 •

edited

Loading

Benehiko commented Mar 21, 2024 •

edited

Loading

laurazard commented Mar 25, 2024 •

edited

Loading