Skip to content

Conversation

@Jameswlepage
Copy link
Contributor

Fixes #130

Summary

  • add embedding DTOs/results/operations and expose helper transformers
  • extend PromptBuilder, AiClient, and the CLI with embedding-specific inputs, result helpers, and output formats
  • introduce embedding contracts in the provider layer plus an OpenAI embedding model implementation alongside updated mocks/tests/documentation

Testing

  • composer phpstan
  • composer phpunit
  • php cli.php "["Embed this doc"]" --capability=embeddings --providerId=openai --modelId=text-embedding-3-large --outputFormat=embeddings-vectors

@Jameswlepage Jameswlepage marked this pull request as ready for review November 26, 2025 21:39
@github-actions
Copy link

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: Jameswlepage <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive embedding generation support to the PHP AI Client, enabling developers to generate vector embeddings using compatible AI models. The implementation extends the existing architecture with minimal disruption while maintaining consistency with the established patterns for text and image generation.

Key Changes:

  • Introduces embedding-specific DTOs (Embedding, EmbeddingResult, EmbeddingOperation) with full serialization support
  • Adds embedding generation interfaces for model implementations
  • Implements OpenAI embedding model support with proper request/response handling
  • Extends PromptBuilder and AiClient with embedding-specific methods
  • Updates CLI with embedding capability support and multiple output formats

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/Results/DTO/EmbeddingResultTest.php Comprehensive test coverage for EmbeddingResult including validation, transformations, and helpers
tests/unit/Providers/ProviderRegistryTest.php Adds test for discovering embedding-capable models through registry
tests/unit/Providers/Models/DTO/ModelRequirementsTest.php Ensures embeddings don't incorrectly require chat history capability
tests/unit/ProviderImplementations/OpenAi/OpenAiEmbeddingModelTest.php Tests OpenAI embedding model with mocked HTTP responses
tests/unit/Operations/DTO/EmbeddingOperationTest.php Tests embedding operation state management and serialization
tests/unit/Embeddings/DTO/EmbeddingTest.php Tests embedding vector validation and normalization
tests/unit/Builders/PromptBuilderTest.php Tests new embedding-specific builder methods
tests/unit/AiClientTest.php Tests static helper methods for embedding generation
tests/traits/MockModelCreationTrait.php Adds mock model creation helpers for embedding tests
tests/mocks/MockProvider.php Registers mock embedding model for testing
src/Results/DTO/EmbeddingResult.php Core result DTO with vector extraction helpers
src/Providers/Models/EmbeddingGeneration/Contracts/EmbeddingGenerationOperationModelInterface.php Interface for async embedding operations
src/Providers/Models/EmbeddingGeneration/Contracts/EmbeddingGenerationModelInterface.php Interface for synchronous embedding generation
src/Providers/Models/DTO/ModelRequirements.php Excludes chat history requirement for embedding capability
src/Providers/Models/Contracts/WithEmbeddingOperationsInterface.php Interface for retrieving embedding operations by ID
src/ProviderImplementations/OpenAi/OpenAiProvider.php Adds embedding model instantiation in factory method
src/ProviderImplementations/OpenAi/OpenAiModelMetadataDirectory.php Detects and categorizes embedding models from OpenAI API
src/ProviderImplementations/OpenAi/OpenAiEmbeddingModel.php Full OpenAI embedding API implementation
src/Operations/DTO/EmbeddingOperation.php Operation DTO for async embedding workflows
src/Embeddings/DTO/Embedding.php Single embedding vector DTO with validation
src/Builders/PromptBuilder.php Adds withEmbeddingInputs and generate methods for embeddings
src/AiClient.php Adds static helper methods for embedding generation
cli.php Extends CLI with embeddings capability and output formats
README.md Documents embedding generation usage and CLI examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +194 to +196
/** @var list<CapabilityEnum> $modelCaps */
$modelCaps = [];
/** @var list<SupportedOption> $modelOptions */
Copy link

Copilot AI Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These PHPStan type annotations for $modelCaps and $modelOptions are unnecessary. The variables are immediately assigned values from the closure's use clause parameters ($embeddingCapabilities, $ttsCapabilities, etc.), which already have their types defined. Since these annotations don't add value and clutter the code, consider removing them.

Suggested change
/** @var list<CapabilityEnum> $modelCaps */
$modelCaps = [];
/** @var list<SupportedOption> $modelOptions */
$modelCaps = [];

Copilot uses AI. Check for mistakes.
Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jameswlepage This is coming along nicely - sorry for the delay in review.

I think there are a few things we can simplify, but most importantly let's not add full operation support yet, as we don't have that elsewhere and it requires a bit more thinking through holistically.

* @param mixed ...$inputs The embedding inputs to add.
* @return self
*/
public function withEmbeddingInputs(...$inputs): self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for this. The input for embeddings is fully compatible with regular prompts. So you can use the same consistent API.

/**
* @var list<Message> Embedding-specific input messages.
*/
protected array $embeddingInputs = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment, we can simply use $messages for embeddings too. They won't hold prompts in that case, but input data.

Comment on lines +319 to +324
$builder = self::prompt(null, $registry);
if ($modelOrConfig instanceof ModelInterface) {
$builder->usingModel($modelOrConfig);
} elseif ($modelOrConfig instanceof ModelConfig) {
$builder->usingModelConfig($modelOrConfig);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the getConfiguredPromptBuilder method instead of duplicating this logic? You can pass $input as the first parameter to the prompt method - they're supposed to be interoperable so you can use the same consistent API.

$builder->usingModelConfig($modelOrConfig);
}

$builder->withEmbeddingInputs($input);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments on PromptBuilder, this won't be needed. Input data can be passed via prompt.

* @param ProviderRegistry|null $registry Optional custom registry.
* @return EmbeddingOperation The embeddings operation.
*/
public static function generateEmbeddingsOperation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not implement this yet - we don't support long-running operations for anything yet, so doing that here would be preliminary.

* @param mixed $input The embedding input to append.
* @return void
*/
private function appendEmbeddingInput($input): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not needed? Or, why is it?

*
* @since 0.2.0
*
* @param list<float|int> $vector The embedding vector values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these really be both? Or should they always be float?

*
* @since 0.2.0
*/
interface WithEmbeddingOperationsInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, we don't need this yet.

*
* @extends AbstractDataTransferObject<EmbeddingResultArrayShape>
*/
class EmbeddingResult extends AbstractDataTransferObject implements ResultInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this was named like that in ARCHITECTURE.md, but I think it may be better to call it EmbeddingsResult, in alignment with the other method naming and the fact that 95% of the time you'll be generating multiple embeddings.

*
* @extends AbstractDataTransferObject<EmbeddingOperationArrayShape>
*/
class EmbeddingOperation extends AbstractDataTransferObject implements OperationInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment on EmbeddingResult, maybe we should call this EmbeddingsResult?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for embedding generation

3 participants