Skip to content

Commit 3a21030

Browse files
authored
feat (ai/core): add embedMany function (#1617)
1 parent 339aafa commit 3a21030

File tree

17 files changed

+444
-24
lines changed

17 files changed

+444
-24
lines changed

‎.changeset/five-knives-deny.md‎

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
'ai': patch
3+
---
4+
5+
feat (ai/core): add embedMany function

‎content/docs/03-ai-sdk-core/30-embeddings.mdx‎

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,42 @@ In this space, similar words are close to each other, and the distance between w
1010

1111
## Embedding a Single Value
1212

13-
The Vercel AI SDK provides the `embed` function to embed single values, which is useful for tasks such as finding similar words
14-
or phrases or clustering text. You can use it with embeddings models, e.g. `openai.embedding('text-embedding-3-large')` or `mistral.embedding('mistral-embed')`.
13+
The Vercel AI SDK provides the [`embed`](/docs/reference/ai-sdk-core/embed) function to embed single values, which is useful for tasks such as finding similar words
14+
or phrases or clustering text.
15+
You can use it with embeddings models, e.g. `openai.embedding('text-embedding-3-large')` or `mistral.embedding('mistral-embed')`.
1516

1617
```tsx
1718
import { embed } from 'ai';
1819
import { openai } from '@ai-sdk/openai';
1920

21+
// 'embedding' is a single embedding object (number[])
2022
const { embedding } = await embed({
2123
model: openai.embedding('text-embedding-3-small'),
2224
value: 'sunny day at the beach',
2325
});
2426
```
27+
28+
## Embedding Many Values
29+
30+
When loading data, e.g. when preparing a data store for retrieval-augmented generation (RAG),
31+
it is often useful to embed many values at once (batch embedding).
32+
33+
The Vercel AI SDK provides the `embedMany` function for this purpose.
34+
Similar to `embed`, you can use it with embeddings models,
35+
e.g. `openai.embedding('text-embedding-3-large')` or `mistral.embedding('mistral-embed')`.
36+
37+
```tsx
38+
import { openai } from '@ai-sdk/openai';
39+
import { embedMany } from 'ai';
40+
41+
// 'embeddings' is an array of embedding objects (number[][]).
42+
// It is sorted in the same order as the input values.
43+
const { embeddings } = await embedMany({
44+
model: openai.embedding('text-embedding-3-small'),
45+
values: [
46+
'sunny day at the beach',
47+
'rainy afternoon in the city',
48+
'snowy night in the mountains',
49+
],
50+
});
51+
```
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
title: embedMany
3+
description: Embed several values using the AI SDK Core (batch embedding)
4+
---
5+
6+
# `embedMany`
7+
8+
Embed several values using an embedding model. The type of the value is defined
9+
by the embedding model.
10+
11+
`embedMany` automatically splits large requests into smaller chunks if the model
12+
has a limit on how many embeddings can be generated in a single call.
13+
14+
## Import
15+
16+
<Snippet text={`import { embedMany } from "ai"`} prompt={false} />
17+
18+
<ReferenceTable packageName="core" functionName="embedMany" />

‎content/docs/07-reference/ai-sdk-core/index.mdx‎

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,5 +33,11 @@ description: Reference documentation for the AI SDK Core
3333
'Generate an embedding for a single value using an embedding model.',
3434
href: '/docs/reference/ai-sdk-core/embed',
3535
},
36+
{
37+
title: 'embedMany',
38+
description:
39+
'Generate embeddings for several values using an embedding model (batch embedding).',
40+
href: '/docs/reference/ai-sdk-core/embed-many',
41+
},
3642
]}
3743
/>

‎content/providers/01-ai-sdk-providers/01-openai.mdx‎

Lines changed: 46 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,9 @@ You can use the following optional settings to customize the OpenAI provider ins
7272
and `compatible` when using 3rd party providers. In `compatible` mode, newer
7373
information such as streamOptions are not being sent. Defaults to 'compatible'.
7474

75-
## Models
75+
## Language Models
7676

77-
The OpenAI provider instance is a function that you can invoke to create a model:
77+
The OpenAI provider instance is a function that you can invoke to create a language model:
7878

7979
```ts
8080
const model = openai('gpt-3.5-turbo');
@@ -92,6 +92,14 @@ const model = openai('gpt-3.5-turbo', {
9292
The available options depend on the API that's automatically chosen for the model (see below).
9393
If you want to explicitly select a specific model API, you can use `.chat` or `.completion`.
9494

95+
### Model Capabilities
96+
97+
| Model | Image Input | Object Generation | Tool Usage | Tool Streaming |
98+
| --------------- | ------------------- | ------------------- | ------------------- | ------------------- |
99+
| `gpt-4-turbo` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
100+
| `gpt-4` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
101+
| `gpt-3.5-turbo` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
102+
95103
### Chat Models
96104

97105
You can create models that call the [OpenAI chat API](https://platform.openai.com/docs/api-reference/chat) using the `.chat()` factory method.
@@ -215,10 +223,40 @@ The following optional settings are available for OpenAI completion models:
215223
A unique identifier representing your end-user, which can help OpenAI to
216224
monitor and detect abuse. Learn more.
217225

218-
## Model Capabilities
226+
## Embedding Models
219227

220-
| Model | Image Input | Object Generation | Tool Usage | Tool Streaming |
221-
| --------------- | ------------------- | ------------------- | ------------------- | ------------------- |
222-
| `gpt-4-turbo` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
223-
| `gpt-4` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
224-
| `gpt-3.5-turbo` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
228+
You can create models that call the [OpenAI embeddings API](https://platform.openai.com/docs/api-reference/embeddings)
229+
using the `.embedding()` factory method.
230+
231+
```ts
232+
const model = openai.embedding('text-embedding-3-large');
233+
```
234+
235+
OpenAI embedding models support several aditional settings.
236+
You can pass them as an options argument:
237+
238+
```ts
239+
const model = openai.embedding('text-embedding-3-large', {
240+
dimensions: 512 // optional, number of dimensions for the embedding
241+
user: 'test-user' // optional unique user identifier
242+
})
243+
```
244+
245+
The following optional settings are available for OpenAI embedding models:
246+
247+
- **dimensions**: _number_
248+
249+
Echo back the prompt in addition to the completion.
250+
251+
- **user** _string_
252+
253+
A unique identifier representing your end-user, which can help OpenAI to
254+
monitor and detect abuse. Learn more.
255+
256+
### Model Capabilities
257+
258+
| Model | Default Dimensions | Custom Dimensions |
259+
| ------------------------ | ------------------ | ------------------- |
260+
| `text-embedding-3-large` | 3072 | <Check size={18} /> |
261+
| `text-embedding-3-small` | 1536 | <Check size={18} /> |
262+
| `text-embedding-ada-002` | 1536 | <Cross size={18} /> |

‎content/providers/01-ai-sdk-providers/02-anthropic.mdx‎

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ You can use the following optional settings to customize the Google Generative A
6060

6161
Custom headers to include in the requests.
6262

63-
## Models
63+
## Language Models
6464

6565
You can create models that call the [Anthropic Messages API](https://docs.anthropic.com/claude/reference/messages_post) using the provider instance.
6666
The first argument is the model id, e.g. `claude-3-haiku-20240307`.
@@ -88,7 +88,7 @@ The following optional settings are available for Anthropic models:
8888
Used to remove "long tail" low probability responses.
8989
Recommended for advanced use cases only. You usually only need to use temperature.
9090

91-
## Model Capabilities
91+
### Model Capabilities
9292

9393
| Model | Image Input | Object Generation | Tool Usage | Tool Streaming |
9494
| -------------------------- | ------------------- | ------------------- | ------------------- | ------------------- |

‎content/providers/01-ai-sdk-providers/03-google-generative-ai.mdx‎

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ You can use the following optional settings to customize the Google Generative A
5858

5959
Custom headers to include in the requests.
6060

61-
## Models
61+
## Language Models
6262

6363
You can create models that call the [Google Generative AI API](https://ai.google.dev/api/rest) using the provider instance.
6464
The first argument is the model id, e.g. `models/gemini-pro`.
@@ -87,7 +87,7 @@ The following optional settings are available for Google Generative AI models:
8787
Top-k sampling considers the set of topK most probable tokens.
8888
Models running with nucleus sampling don't allow topK setting.
8989

90-
## Model Capabilities
90+
### Model Capabilities
9191

9292
| Model | Image Input | Object Generation | Tool Usage | Tool Streaming |
9393
| ------------------------------ | ------------------- | ------------------- | ------------------- | ------------------- |

‎content/providers/01-ai-sdk-providers/04-mistral.mdx‎

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ The Mistral provider is available in the `@ai-sdk/mistral` module. You can insta
2323
<Snippet text="yarn add @ai-sdk/mistral" dark />
2424
</Tab>
2525
</Tabs>
26+
2627
## Provider Instance
2728

2829
You can import the default provider instance `mistral` from `@ai-sdk/mistral`:
@@ -58,7 +59,7 @@ You can use the following optional settings to customize the Mistral provider in
5859

5960
Custom headers to include in the requests.
6061

61-
## Models
62+
## Language Models
6263

6364
You can create models that call the [Mistral chat API](https://docs.mistral.ai/api/#operation/createChatCompletion) using provider instance.
6465
The first argument is the model id, e.g. `mistral-large-latest`.
@@ -85,9 +86,18 @@ The following optional settings are available for Mistral models:
8586

8687
Defaults to `false`.
8788

88-
## Model Capabilities
89+
### Model Capabilities
8990

9091
| Model | Image Input | Object Generation | Tool Usage | Tool Streaming |
9192
| ---------------------- | ------------------- | ------------------- | ------------------- | ------------------- |
9293
| `mistral-large-latest` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
9394
| `mistral-small-latest` | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
95+
96+
## Embedding Models
97+
98+
You can create models that call the [Mistral embeddings API](https://docs.mistral.ai/api/#operation/createEmbedding)
99+
using the `.embedding()` factory method.
100+
101+
```ts
102+
const model = mistral.embedding('mistral-embed');
103+
```
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import { mistral } from '@ai-sdk/mistral';
2+
import { embedMany } from 'ai';
3+
import dotenv from 'dotenv';
4+
5+
dotenv.config();
6+
7+
async function main() {
8+
const { embeddings } = await embedMany({
9+
model: mistral.embedding('mistral-embed'),
10+
values: [
11+
'sunny day at the beach',
12+
'rainy afternoon in the city',
13+
'snowy night in the mountains',
14+
],
15+
});
16+
17+
console.log(embeddings);
18+
}
19+
20+
main().catch(console.error);
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import { openai } from '@ai-sdk/openai';
2+
import { embedMany } from 'ai';
3+
import dotenv from 'dotenv';
4+
5+
dotenv.config();
6+
7+
async function main() {
8+
const { embeddings } = await embedMany({
9+
model: openai.embedding('text-embedding-3-small'),
10+
values: [
11+
'sunny day at the beach',
12+
'rainy afternoon in the city',
13+
'snowy night in the mountains',
14+
],
15+
});
16+
17+
console.log(embeddings);
18+
}
19+
20+
main().catch(console.error);

0 commit comments

Comments
 (0)