Skip to content

Conversation

@strogiyotec
Copy link
Contributor

@strogiyotec strogiyotec requested a review from a team as a code owner January 14, 2026 07:28
@strogiyotec strogiyotec requested a review from katcharov January 14, 2026 07:28
rozza
rozza previously requested changes Jan 14, 2026
Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its looking good, add some unit tests to AggregatesTest, so that the shape of the produced Bson is as expected. So there is some test coverage for now.

Also don't forget to add the license headers to the new files!

@strogiyotec
Copy link
Contributor Author

Its looking good, add some unit tests to AggregatesTest, so that the shape of the produced Bson is as expected. So there is some test coverage for now.

Also don't forget to add the license headers to the new files!

hey @rozza thanks for your feedback

  1. I added license headers
  2. added a unit test
  3. fixed a few minor things to make checkstyle to pass

@Beta(Reason.SERVER)
public static Bson vectorSearch(
final FieldSearchPath path,
final Bson query,
Copy link
Member

@vbabanin vbabanin Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this was considered already, but I wanted to propose we consider an alternative: model the auto-embedding query as a typed object instead of accepting raw Bson.

Why raw Bson is tricky here
Using raw Bson for the query is “stringly-typed”:

  • typos / wrong types / wrong structure compile and fail at runtime .
  • it’s not very discoverable (users would have to read the docs to learn the required document shape).
//What goes in `???`? User must read documentation.  IDE provides no help.
vectorSearch(path, ???, index, limit, options)

Also, adding model as a separate parameter feels easy to mix up since model and index are both String.

This matches existing patterns in the driver
The alternative proposed below follows patterns already in the driver's API:

  • Static factory creates a variant:
public interface SearchPath extends Bson {
    static FieldSearchPath fieldPath(final String path) {
        return new SearchConstructibleBson(new BsonDocument(SEARCH_PATH_VALUE_KEY, new BsonString(path)));
    }
  • Optional fluent config (e.g., FieldSearchPath.multi(...))
public interface FieldSearchPath extends SearchPath {
    FieldSearchPath multi(String analyzerName);  // optional
}

// usage:
fieldPath("plot") 
fieldPath("plot").multi("english")     // with optional analyzer

The equivalent here would be:
VectorSearchQuery.textQuery("...") (factory).model("...") (optional fluent config)

API surface:

a typed query abstraction:

 @Beta(Reason.SERVER)
 public interface VectorSearchQuery extends Bson {
     static TextVectorSearchQuery textQuery(final String text) {
         return new TextVectorSearchQueryImpl(notNull("text", text), null);
     }
 }

 @Beta(Reason.SERVER)
 public interface TextVectorSearchQuery extends VectorSearchQuery {
     TextVectorSearchQuery model(String modelName);
 }
Implementation remains package-private and immutable: (expandable)
final class TextVectorSearchQueryImpl implements TextVectorSearchQuery {
    private final String text;
    private final @Nullable String model;

    TextVectorSearchQueryImpl(String text, @Nullable String model) { ... }

    @Override public TextVectorSearchQuery model(String modelName) {
        return new TextVectorSearchQueryImpl(text, notNull("modelName", modelName));
    }

    @Override public @Nullable String getModel() { return model; }

    @Override public <TDocument> BsonDocument toBsonDocument(Class<TDocument> dc, CodecRegistry cr) {
        return new Document("text", text).toBsonDocument(dc, cr);
    }
}

Then Aggregate methods becomes:

@Beta(Reason.SERVER)
public static Bson vectorSearch(
    FieldSearchPath path,
    VectorSearchQuery query,
    String index,
    long limit,
    VectorSearchOptions options
) { ... }

The usage of the API is type safe:

vectorSearch(fieldPath("plot"),
    textQuery("movies about cars").model("voyage-4-large"),
    "my_index",
    10,
    options
)

API Evolution Support (future modalities):

This approach can evolve without changing the Aggregates.vectorSearch(...) signature by adding new factories and subtypes later (e.g. imageQuery(...), audioQuery(...)) with modality-specific options.

  • add VectorSearchQuery.imageQuery(...) returning ImageVectorSearchQuery
  • add image-specific fluent methods on that subtype (resolution(...), etc.)
VectorSearchQuery (interface factory)
  - textQuery(String) -> TextVectorSearchQuery
  - (future) imageQuery(BsonBinary) -> ImageVectorSearchQuery
  - (future) audioQuery(BsonBinary) -> AudioVectorSearchQuery

TextVectorSearchQuery (current)
  - model(String)

ImageVectorSearchQuery (future)
  - model(String)
  - resolution(int, int)

AudioVectorSearchQuery (future)
  - model(String)
  - sampleRate(int)

This keeps the stage discoverable, compile-time safe, and provides a potential path for future query modalities. What do you think?

P.S : the server may ship modalities incrementally (e.g., text could reach GA first, with image/audio staying Beta).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR to include fluent builders thanks Slav really good feedback

* Creates a {@code $vectorSearch} pipeline stage supported by MongoDB Atlas.
* You may use the {@code $meta: "vectorSearchScore"} expression, e.g., via {@link Projections#metaVectorSearchScore(String)},
* to extract the relevance score assigned to each found document.
* This overload is used for auto-embedding in atlas community edition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it also be used in Enterprise Edition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only in community edition right now, but I believe in the future it will be extended to atlas too

@Beta(Reason.SERVER)
public static Bson vectorSearch(
final FieldSearchPath path,
final TextVectorSearchQuery query,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now I added this specific Text interface because with plain Vector we won't have an access to get model without casting

Copy link
Member

@vbabanin vbabanin Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why TextVectorSearchQuery is used in the signature today as it’s the only way to access getModel() without a cast. However, I think using VectorSearchQuery as parameter helps discoverability. We could keep VectorSearchQuery as the parameter type and casting to AbstractVectorSearchQuery inside the stage builder.

If users see this signature:

public static Bson vectorSearch(
    FieldSearchPath path,
    TextVectorSearchQuery query,
    String index, ...)

a natural question is: “How do I create a TextVectorSearchQuery?” If they type TextVectorSearchQuery in the IDE to look for clues (method factories in this case), none will show up (since the factory lives on VectorSearchQuery). Using VectorSearchQuery keeps construction discoverable via VectorSearchQuery.textQuery(...).

Internally, getModel() could be accessed via the internal abstract base, similar to the approach used in the Client Bulk Write internal models:
AbstractClientDeleteModel.java#L23-L39

Suggested change: commit

/**
* Specifies the embedding model to use for generating embeddings from the query text.
* <p>
* If not specified, the model configured in the vector search index definition will be used.
Copy link
Member

@vbabanin vbabanin Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]

Suggested change
* If not specified, the model configured in the vector search index definition will be used.
* If not specified, the model configured in the vector {@link SearchIndexModel} definition will be used.

Note: requires an import.

@Beta(Reason.SERVER)
public static Bson vectorSearch(
final FieldSearchPath path,
final TextVectorSearchQuery query,
Copy link
Member

@vbabanin vbabanin Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why TextVectorSearchQuery is used in the signature today as it’s the only way to access getModel() without a cast. However, I think using VectorSearchQuery as parameter helps discoverability. We could keep VectorSearchQuery as the parameter type and casting to AbstractVectorSearchQuery inside the stage builder.

If users see this signature:

public static Bson vectorSearch(
    FieldSearchPath path,
    TextVectorSearchQuery query,
    String index, ...)

a natural question is: “How do I create a TextVectorSearchQuery?” If they type TextVectorSearchQuery in the IDE to look for clues (method factories in this case), none will show up (since the factory lives on VectorSearchQuery). Using VectorSearchQuery keeps construction discoverable via VectorSearchQuery.textQuery(...).

Internally, getModel() could be accessed via the internal abstract base, similar to the approach used in the Client Bulk Write internal models:
AbstractClientDeleteModel.java#L23-L39

Suggested change: commit

@vbabanin
Copy link
Member

Approved, given functional tests aren’t currently run in CI; we should enable them in a follow-up PR under JAVA-6059.

@katcharov katcharov dismissed rozza’s stale review January 16, 2026 15:27

@rozza is out, concerns should be addressed

@strogiyotec strogiyotec merged commit 6630fe2 into mongodb:main Jan 16, 2026
56 checks passed
nhachicha pushed a commit that referenced this pull request Jan 19, 2026
* JAVA-5988

* JAVA-5988 address PR comments

* JAVA-5988 add a unit test to AggregatesTest

* JAVA-5988 embedding fluent API with builders

* JAVA-5988 introduce AbstractVectorSearchQuery

* JAVA-5988 fix checkstyle

* JAVA05988 use VectorSearchQuery as param

* JAVA-5988 add more unit tests

* JAVA-5988 add autoembedding to Scala

---------

Co-authored-by: Almas Abdrazak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants