Skip to content

Conversation

@Benehiko
Copy link
Member

@Benehiko Benehiko commented Mar 21, 2024

Fixes #4948

- What I did
Increase the context timeout from 100ms to 500ms. Removed the manual close on the result channels to prevent panics on test failures.

- How I did it

- How to verify it

$ go test -v -run=TestPromptForConfirmation/case=reader_closed -count=1000 ./cli/command/utils_test.go

=== RUN   TestPromptForConfirmation/case=reader_closed
--- PASS: TestPromptForConfirmation (0.00s)
    --- PASS: TestPromptForConfirmation/case=reader_closed (0.00s)
PASS
ok  	command-line-arguments	0.263s

- Description for the changelog

Fix TestPromptForConfirmation flakiness

- A picture of a cute animal (not mandatory but encouraged)

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2024

Codecov Report

Merging #4957 (7ea10d5) into master (2ae903e) will decrease coverage by 0.27%.
Report is 16 commits behind head on master.
The diff coverage is n/a.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4957      +/-   ##
==========================================
- Coverage   61.44%   61.18%   -0.27%     
==========================================
  Files         289      294       +5     
  Lines       20241    20538     +297     
==========================================
+ Hits        12437    12566     +129     
- Misses       6903     7077     +174     
+ Partials      901      895       -6     

@Benehiko Benehiko self-assigned this Mar 21, 2024
@Benehiko Benehiko added area/testing kind/bugfix PR's that fix bugs labels Mar 21, 2024
@Benehiko Benehiko requested a review from thaJeztah March 21, 2024 10:25
@thaJeztah thaJeztah requested a review from laurazard March 21, 2024 11:06
Copy link
Contributor

@krissetto krissetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Generally speaking, could you reproduce the exact error locally by making this timeout really small?

@Benehiko
Copy link
Member Author

Benehiko commented Mar 21, 2024

LGTM.

Generally speaking, could you reproduce the exact error locally by making this timeout really small?

Yes, but it had to be really really small.

resultCtx, resultCancel := context.WithTimeout(ctx, 1*time.Nanosecond)
    utils_test.go:222: PromptForConfirmation did not return after promptReader was closed
--- FAIL: TestPromptForConfirmation (0.00s)
    --- FAIL: TestPromptForConfirmation/case=reader_closed (0.00s)
=== RUN   TestPromptForConfirmation
panic: send on closed channel

goroutine 36 [running]:
command-line-arguments_test.TestPromptForConfirmation.func8.2()
	/home/benehiko/Github/docker-cli/cli/command/utils_test.go:209 +0x6d
created by command-line-arguments_test.TestPromptForConfirmation.func8 in goroutine 35
	/home/benehiko/Github/docker-cli/cli/command/utils_test.go:207 +0x32a
FAIL	command-line-arguments	0.011s
FAIL

This was my laptops flake level:

resultCtx, resultCancel := context.WithTimeout(ctx, 100000*time.Nanosecond)

@laurazard
Copy link
Collaborator

Changing the timeout length causing a panic is a code smell – and simply not closing the channel "fixes" it, but isn't really correct. The issue here is we're launching a goroutine in https://github.com/Benehiko/docker-cli/blob/d2ea5adfe401205d39050abe117cd1cb6811764b/cli/command/utils_test.go#L205-L208 without a way to signal it to end/that we don't care about it's result anymore.

@Benehiko
Copy link
Member Author

Changing the timeout length causing a panic is a code smell – and simply not closing the channel "fixes" it, but isn't really correct. The issue here is we're launching a goroutine in https://github.com/Benehiko/docker-cli/blob/d2ea5adfe401205d39050abe117cd1cb6811764b/cli/command/utils_test.go#L205-L208 without a way to signal it to end/that we don't care about it's result anymore.

The panic is not caused by a too short timeout - it's just a side-effect of it. Closing the channel in this function was actually a mistake since a test error would always throw a panic, which is not what we want. Omitting the lines closing the channel will keep the channel open for as long as the goroutine is active and will be cleaned up by the garbage collector.
Eventually the goroutine will be terminated due to context cancellation.

Please see #4948 (comment)

@laurazard
Copy link
Collaborator

laurazard commented Mar 25, 2024

The panic is not caused by a too short timeout - it's just a side-effect of it.

Gotcha :')

I still think we can do a bit better – instead of calling functions that know how to verify if a test has passed (and if they've timed out), we can have test cases tell us what their expected results are and check it ourselves within the main test flow. For example:

	for _, tc := range []struct {
		desc           string
		f              func() error
		expectedResult promptResult
	}{
		{
			"SIGINT", func() error {
				syscall.Kill(syscall.Getpid(), syscall.SIGINT)
				return nil
			},
			promptResult{
				result: false,
				err:    command.ErrPromptTerminated,
			},
		},
		{
			"no", func() error {
				_, err := fmt.Fprint(promptWriter, "n\n")
				return err
			},
			promptResult{
				result: false,
			},
		},
		{
			"yes", func() error {
				_, err := fmt.Fprint(promptWriter, "y\n")
				return err
			},
			promptResult{
				result: true,
			},
		},
		{
			"any", func() error {
				_, err := fmt.Fprint(promptWriter, "a\n")
				return err
			},
			promptResult{
				result: false,
			},
		},
		{
			"with space", func() error {
				_, err := fmt.Fprint(promptWriter, " y\n")
				return err
			},
			promptResult{
				result: true,
			},
		},
		{
			"reader closed", func() error {
				return promptReader.Close()
			},
			promptResult{
				result: false,
			},
		},
	} {
		t.Run("case="+tc.desc, func(t *testing.T) {
			buf.Reset()
			promptReader, promptWriter = io.Pipe()

			wroteHook := make(chan struct{}, 1)
			promptOut := test.NewWriterWithHook(bufioWriter, func(_ []byte) {
				wroteHook <- struct{}{}
			})

			result := make(chan promptResult, 1)
			go func() {
				r, err := command.PromptForConfirmation(ctx, promptReader, promptOut, "")
				result <- promptResult{r, err}
			}()

			select {
			case <-time.After(100 * time.Millisecond):
			case <-wroteHook:
			}

			drainChannel(ctx, wroteHook)

			assert.NilError(t, bufioWriter.Flush())
			assert.Equal(t, strings.TrimSpace(buf.String()), "Are you sure you want to proceed? [y/N]")

			assert.NilError(t, tc.f())

			select {
			case r := <-result:
				assert.Equal(t, r, tc.expectedResult)
			case <-time.After(500 * time.Millisecond):
				t.Fatal("test timed out - " + tc.desc)
			}
		})
	}
}

func drainChannel(ctx context.Context, ch <-chan struct{}) {
	go func() {
		for {
			select {
			case <-ctx.Done():
				return
			case <-ch:
			}
		}
	}()
}

I think this is easier to follow/less prone to errors than the other way around. As a bonus, we get to delete pollForPromptOutput.

Contexts are great, but if we're just trying to time things out, a simple select with the result channel and a time.After([duration]) is really clear.

Signed-off-by: Alano Terblanche <[email protected]>
@Benehiko
Copy link
Member Author

@laurazard could you take another look? I've refactored the test according to your suggestion

@laurazard
Copy link
Collaborator

Thanks!

Copy link
Collaborator

@laurazard laurazard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@laurazard laurazard merged commit b8d5454 into docker:master Mar 26, 2024
@Benehiko Benehiko deleted the prompt-test-flakiness branch March 26, 2024 13:11
@thaJeztah thaJeztah added this to the 26.1.0 milestone Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/testing kind/bugfix PR's that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flaky test: TestPromptForConfirmation/case=reader_closed

5 participants