-
Notifications
You must be signed in to change notification settings - Fork 41
Add embedding generation support #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive embedding generation support to the PHP AI Client, enabling developers to generate vector embeddings using compatible AI models. The implementation extends the existing architecture with minimal disruption while maintaining consistency with the established patterns for text and image generation.
Key Changes:
- Introduces embedding-specific DTOs (Embedding, EmbeddingResult, EmbeddingOperation) with full serialization support
- Adds embedding generation interfaces for model implementations
- Implements OpenAI embedding model support with proper request/response handling
- Extends PromptBuilder and AiClient with embedding-specific methods
- Updates CLI with embedding capability support and multiple output formats
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/Results/DTO/EmbeddingResultTest.php | Comprehensive test coverage for EmbeddingResult including validation, transformations, and helpers |
| tests/unit/Providers/ProviderRegistryTest.php | Adds test for discovering embedding-capable models through registry |
| tests/unit/Providers/Models/DTO/ModelRequirementsTest.php | Ensures embeddings don't incorrectly require chat history capability |
| tests/unit/ProviderImplementations/OpenAi/OpenAiEmbeddingModelTest.php | Tests OpenAI embedding model with mocked HTTP responses |
| tests/unit/Operations/DTO/EmbeddingOperationTest.php | Tests embedding operation state management and serialization |
| tests/unit/Embeddings/DTO/EmbeddingTest.php | Tests embedding vector validation and normalization |
| tests/unit/Builders/PromptBuilderTest.php | Tests new embedding-specific builder methods |
| tests/unit/AiClientTest.php | Tests static helper methods for embedding generation |
| tests/traits/MockModelCreationTrait.php | Adds mock model creation helpers for embedding tests |
| tests/mocks/MockProvider.php | Registers mock embedding model for testing |
| src/Results/DTO/EmbeddingResult.php | Core result DTO with vector extraction helpers |
| src/Providers/Models/EmbeddingGeneration/Contracts/EmbeddingGenerationOperationModelInterface.php | Interface for async embedding operations |
| src/Providers/Models/EmbeddingGeneration/Contracts/EmbeddingGenerationModelInterface.php | Interface for synchronous embedding generation |
| src/Providers/Models/DTO/ModelRequirements.php | Excludes chat history requirement for embedding capability |
| src/Providers/Models/Contracts/WithEmbeddingOperationsInterface.php | Interface for retrieving embedding operations by ID |
| src/ProviderImplementations/OpenAi/OpenAiProvider.php | Adds embedding model instantiation in factory method |
| src/ProviderImplementations/OpenAi/OpenAiModelMetadataDirectory.php | Detects and categorizes embedding models from OpenAI API |
| src/ProviderImplementations/OpenAi/OpenAiEmbeddingModel.php | Full OpenAI embedding API implementation |
| src/Operations/DTO/EmbeddingOperation.php | Operation DTO for async embedding workflows |
| src/Embeddings/DTO/Embedding.php | Single embedding vector DTO with validation |
| src/Builders/PromptBuilder.php | Adds withEmbeddingInputs and generate methods for embeddings |
| src/AiClient.php | Adds static helper methods for embedding generation |
| cli.php | Extends CLI with embeddings capability and output formats |
| README.md | Documents embedding generation usage and CLI examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** @var list<CapabilityEnum> $modelCaps */ | ||
| $modelCaps = []; | ||
| /** @var list<SupportedOption> $modelOptions */ |
Copilot
AI
Dec 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These PHPStan type annotations for $modelCaps and $modelOptions are unnecessary. The variables are immediately assigned values from the closure's use clause parameters ($embeddingCapabilities, $ttsCapabilities, etc.), which already have their types defined. Since these annotations don't add value and clutter the code, consider removing them.
| /** @var list<CapabilityEnum> $modelCaps */ | |
| $modelCaps = []; | |
| /** @var list<SupportedOption> $modelOptions */ | |
| $modelCaps = []; |
felixarntz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jameswlepage This is coming along nicely - sorry for the delay in review.
I think there are a few things we can simplify, but most importantly let's not add full operation support yet, as we don't have that elsewhere and it requires a bit more thinking through holistically.
| * @param mixed ...$inputs The embedding inputs to add. | ||
| * @return self | ||
| */ | ||
| public function withEmbeddingInputs(...$inputs): self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for this. The input for embeddings is fully compatible with regular prompts. So you can use the same consistent API.
| /** | ||
| * @var list<Message> Embedding-specific input messages. | ||
| */ | ||
| protected array $embeddingInputs = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment, we can simply use $messages for embeddings too. They won't hold prompts in that case, but input data.
| $builder = self::prompt(null, $registry); | ||
| if ($modelOrConfig instanceof ModelInterface) { | ||
| $builder->usingModel($modelOrConfig); | ||
| } elseif ($modelOrConfig instanceof ModelConfig) { | ||
| $builder->usingModelConfig($modelOrConfig); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the getConfiguredPromptBuilder method instead of duplicating this logic? You can pass $input as the first parameter to the prompt method - they're supposed to be interoperable so you can use the same consistent API.
| $builder->usingModelConfig($modelOrConfig); | ||
| } | ||
|
|
||
| $builder->withEmbeddingInputs($input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments on PromptBuilder, this won't be needed. Input data can be passed via prompt.
| * @param ProviderRegistry|null $registry Optional custom registry. | ||
| * @return EmbeddingOperation The embeddings operation. | ||
| */ | ||
| public static function generateEmbeddingsOperation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not implement this yet - we don't support long-running operations for anything yet, so doing that here would be preliminary.
| * @param mixed $input The embedding input to append. | ||
| * @return void | ||
| */ | ||
| private function appendEmbeddingInput($input): void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not needed? Or, why is it?
| * | ||
| * @since 0.2.0 | ||
| * | ||
| * @param list<float|int> $vector The embedding vector values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these really be both? Or should they always be float?
| * | ||
| * @since 0.2.0 | ||
| */ | ||
| interface WithEmbeddingOperationsInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, we don't need this yet.
| * | ||
| * @extends AbstractDataTransferObject<EmbeddingResultArrayShape> | ||
| */ | ||
| class EmbeddingResult extends AbstractDataTransferObject implements ResultInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this was named like that in ARCHITECTURE.md, but I think it may be better to call it EmbeddingsResult, in alignment with the other method naming and the fact that 95% of the time you'll be generating multiple embeddings.
| * | ||
| * @extends AbstractDataTransferObject<EmbeddingOperationArrayShape> | ||
| */ | ||
| class EmbeddingOperation extends AbstractDataTransferObject implements OperationInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment on EmbeddingResult, maybe we should call this EmbeddingsResult?
Fixes #130
Summary
Testing