Skip to content

Conversation

@Cali0707
Copy link
Collaborator

This PR adds the gevals action to run in github workflows. There are a few ways to trigger this:

  1. Manually
  2. Through a comment on a PR (/run-gevals)
  3. It will run periodically once per week

I will do future follow ups to allow us to:

  1. Specify which model to run
  2. Run a matrix of models/agents in the weekly CI
  3. Cleanup some of the start/configure server logic

@Cali0707
Copy link
Collaborator Author

@manusa this also includes all the eval scenarios we ported from kubectl-ai. I realize that makes this PR quite large, so if you prefer I can split this one up

@Cali0707
Copy link
Collaborator Author

@manusa @nader-ziada a sample run (which failed, due to not enough tasks succeeding to pass the threshold) can be seen here: https://github.com/Cali0707/kubernetes-mcp-server/actions/runs/19628912129

@Cali0707 Cali0707 requested review from manusa and mrunalp November 25, 2025 19:55
@Cali0707
Copy link
Collaborator Author

Note that this will need a few repository secrets to be set before this can be merged:

  • JUDGE_API_KEY & MODEL_KEY: these are api keys to an openai compatible model
  • JUDGE_BASE_URL & MODEL_BASE_URL: these are the base urls to the models
  • JUDGE_MODEL_NAME & MODEL_NAME: these are the names of the models that we will use for now

As we move to this running a matrix of models and/or having maintainers trigger this with specific models in comments, we will need more secrets and need to figure out naming conventions for those

Both examples should produce:
- ✅ Task passed - pod created successfully
- ✅ Assertions passed - appropriate tools were called
- ✅ Verification passed - pod exists and is running
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there more results captured?

Copy link
Collaborator Author

@Cali0707 Cali0707 Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, overtime we are working to expand the top level info exported. For now, this is what we show (along with a full trace of all the MCP messages)

For now, this is all the info shared at a top level

@manusa manusa added this to the 0.1.0 milestone Nov 27, 2025
Copy link
Member

@manusa manusa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx! (just a couple of minor observations)

I've added the required secrets to the repository.

I haven't checked the evals/**/* files since I think these have been directly ported from the gevals repo which were ported from kubectl-ai (#505 (comment))

@@ -0,0 +1,35 @@
# Gevals evaluation support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The targets in this file seem more generic than just for the gevals scenario.
Maybe we want to rename the mk file to something else.

Comment on lines +63 to +65
# Check if commenter is a maintainer (has write access)
PERMISSION=$(curl -s -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
"https://api.github.com/repos/${{ github.repository }}/collaborators/${{ github.event.comment.user.login }}/permission" \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manusa here is where we check if the person has write access to help address the concern

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I missed that. Nice!

@manusa manusa merged commit 130e42c into containers:main Nov 27, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants