Skip to content

Conversation

@Uk1288
Copy link
Contributor

@Uk1288 Uk1288 commented Apr 28, 2025

Fixes #

  • This PR implements the subcommand terraform stacks
  • The new command terraform stacks exposes some Terraform Stack operations through the cli. The available subcommands depend on the stacks plugin implementation.

NOTE: the terraform stacks -help will be implemented in a separate PR

Target Release

1.13.x

CHANGELOG entry

  • The new command terraform stacks exposes some Terraform Stack operations through the cli. The available subcommands depend on the stacks plugin implementation. Use terraform stacks -help to see available commands.

  • This change is user-facing and I added a changelog entry.

  • This change is not user-facing.

NOTE: Test steps are written in the project slack channel

@Uk1288 Uk1288 force-pushed the TF-25178-add-stacks-plugin-proto branch from 03462d8 to 3bc1359 Compare May 5, 2025 20:08
@Uk1288 Uk1288 marked this pull request as ready for review May 5, 2025 21:37
@Uk1288 Uk1288 requested review from a team as code owners May 5, 2025 21:37
Copy link
Contributor

@Maed223 Maed223 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke testing is looking good from the perspective of the Stacks plugin 🚀 ! Mostly just one point about about an additional bit of needed info in the StacksPluginConfig

Otherwise I'm realizing that we aren't fully honoring the -json flag here when we have diagnostics returned. Though it's an awkward situation since not all stacks subcommands implement a json view. Don't think it's a massive problem since we'd only return diags here if the hostname/token env vars aren't configured, or if the plugin dev override is in use. Wanted to call that out, but I don't have a strong opinion on how/if to handle it.

@Maed223
Copy link
Contributor

Maed223 commented May 12, 2025

Found out the cause (and the fixes) of the occasional nil panics we were seeing in the prototype, stemming from services here not being properly managed. One note is that in addition to what we were seeing in stacks validate, I also saw a somewhat similar issue in stacks init with the packages and dependencies server. In any case we can get those both solved here!

Accessing the service before it's fully set up

In stacks init we were seeing the rpc error: code = Unimplemented desc = unknown service terraform1.packages.Packages error from trying to access the service before it gets fully set up.

This looks to be a bit of a race condition since we start the services within asynchronous go routines:

dependenciesBrokerID := c.Broker.NextId()
go c.Broker.AcceptAndServe(dependenciesBrokerID, dependenciesServerFunc)

packagesBrokerID := c.Broker.NextId()
go c.Broker.AcceptAndServe(packagesBrokerID, packagesServerFunc)

stacksBrokerID := c.Broker.NextId()
go c.Broker.AcceptAndServe(stacksBrokerID, stacksServerFunc)

but registerBrokers immediately returns the broker IDs, so there's no guarantee that the servers are fully initialized and ready to accept connections when the Stacks Plugin tries to use them.

Solution

Implement a mechanism to ensure all broker services are fully initialized before proceeding:

func (c GRPCStacksClient) registerBrokers(stdout, stderr io.Writer) brokerIDs {
    // ... existing code ...
    
    // Create channels to signal when each service is ready
    dependenciesReady := make(chan struct{})
    packagesReady := make(chan struct{})
    stacksReady := make(chan struct{})
    
    dependenciesServerFunc := func(opts []grpc.ServerOption) *grpc.Server {
        s = grpc.NewServer(opts...)
        dependencies.RegisterDependenciesServer(s, dependenciesServer)
        dependenciesServer.ActivateRPCServer(newDependenciesServer(handles, c.Services))
        close(dependenciesReady) // Signal that this service is ready
        return s
    }
    
    // Similar modifications for other services...
    
    // Wait for all services to be ready
    <-dependenciesReady
    <-packagesReady
    <-stacksReady
    
    return brokerIDs{...}
}

The service closing before the complete execution of the command

This was the class of problem we were seeing in the stacks validate panic.

In registerBrokers, within each server function we are assigning the new server to the existing var s *grpc.Server pointer variable that's outside of each function scope. Each server function overwrites that variable, causing it to lose it's last reference making it eligible for garbage collection. So this leads us to a situation where only the last server function has a stable reference, and the earlier ones could be garbage collected once their goroutines completed setting up the services. This seems to be why we only saw the panics for the dependencies and packages services (the stacks service being the last server function created).

Solution

Get rid of the var s *grpc.Server and create independent variables in each function's scope:

dependenciesServerFunc := func(opts []grpc.ServerOption) *grpc.Server {
    s := grpc.NewServer(opts...)  //  the := creating a new variable
    // ... other code ...
    return s
}

ctx: ctx,
serviceURL: serviceURL,
httpClient: retryableClient,
pluginName: "cloudplugin",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you are scrubbing the term "cloudplugin" from this package, which I think is a good idea. But it's not clear to me if you went as far as you could go or if you just missed a few of them. This is just one example.

IMO, it's OK to remove or manipulate all cloudplugin features from terraform and shift those over to stacks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went about the rearrangement by creating a base client so that each plugin could use an instance of the base client, that's why there is still both cloudclient and stacksclient in the package.

Maed223
Maed223 previously approved these changes May 13, 2025
Copy link
Contributor

@Maed223 Maed223 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke tested, and looks great from my perspective 🚀

Comment on lines +51 to +54
var serverWG sync.WaitGroup
// wait for all 3 servers to start
serverWG.Add(3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very elegant fix here!

Copy link
Contributor

@liamcervante liamcervante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🚀

@Uk1288 Uk1288 merged commit f7cb909 into main May 14, 2025
12 checks passed
@Uk1288 Uk1288 deleted the TF-25178-add-stacks-plugin-proto branch May 14, 2025 16:15
@henrythor
Copy link

I'm getting the following error when I use terraform stacks:

╷
│ Error: Stacks plugin download error
│
│ could not resolve stacksplugin version for host "app.terraform.io": failed to fetch stacksplugin manifest: plugin is not supported by the remote version of Terraform Enterprise

@SarahFrench
Copy link
Member

@henrythor - Could you please open a bug report describing the issue you've encountered? Then your report can go through our triage process.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2025

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants