PGVector

Overview

InsForge supports pgvector, a PostgreSQL extension for vector similarity search. Use it to build semantic search, recommendations, RAG pipelines, or anything that needs “find similar items” functionality. InsForge also provides a built-in embeddings SDK through the AI gateway, so you can generate and store vectors without configuring external providers.

Enabling the Extension

Enable pgvector via SQL:

create extension if not exists vector;

The extension is named vector in PostgreSQL, though the package is commonly called “pgvector”.

Creating Vector Columns

Create a table with a vector column. The dimension must match your embedding model:

create table documents (
  id bigserial primary key,
  content text,
  embedding vector(1536),
  created_at timestamptz default now()
);

Generating Embeddings with InsForge AI

Use the InsForge SDK to generate embeddings directly through the AI gateway. No external API keys are needed in your application code.

const response = await insforge.ai.embeddings.create({
  model: 'openai/text-embedding-3-small',
  input: 'Your text here',
});

console.log(response.data[0].embedding);

The method accepts a single string or an array of strings. Parameters:

Parameter	Type	Required	Description
`model`	string	Yes	Embedding model identifier
`input`	string or string[]	Yes	Text to embed
`encoding_format`	string	No	`float` (default) or `base64`
`dimensions`	number	No	Output dimensionality override

Available Embedding Models

Model	Dimensions
openai/text-embedding-3-small	1536
openai/text-embedding-3-large	3072
openai/text-embedding-ada-002	1536
google/gemini-embedding-001	768

You can also use any external embedding provider (OpenAI, Cohere, Hugging Face, etc.) and store the resulting vectors in pgvector. The InsForge AI gateway is optional.

Storing Embeddings

Generate an embedding and insert it in one step:

async function storeDocument(content: string) {
  const response = await insforge.ai.embeddings.create({
    model: 'openai/text-embedding-3-small',
    input: content,
  });

  const { data, error } = await insforge.database
    .from('documents')
    .insert([{
      content,
      embedding: response.data[0].embedding,
    }])
    .select();

  return { data, error };
}

Querying Vectors

Use distance operators to find similar vectors:

select * from documents
order by embedding <=> '[0.1, 0.2, ...]'
limit 5;

Distance Operators

Operator	Description
`<->`	L2 distance
`<#>`	Inner product (negative)
`<=>`	Cosine distance

For normalized embeddings (like OpenAI’s), use cosine distance <=>. Similarity = 1 - distance.

Similarity Search with RPC

For production use, create a Postgres function that handles similarity search server-side. This avoids round trips and keeps the logic close to the data.

create or replace function match_documents(
  query_embedding vector(1536),
  match_count int default 5,
  match_threshold float default 0.78
)
returns table (
  id bigint,
  content text,
  similarity float
)
language sql stable
as $$
  select
    id,
    content,
    1 - (embedding <=> query_embedding) as similarity
  from documents
  where 1 - (embedding <=> query_embedding) > match_threshold
  order by embedding <=> query_embedding
  limit match_count;
$$;

Call it from the SDK:

const queryResponse = await insforge.ai.embeddings.create({
  model: 'openai/text-embedding-3-small',
  input: 'What is machine learning?',
});

const { data, error } = await insforge.database.rpc('match_documents', {
  query_embedding: queryResponse.data[0].embedding,
  match_count: 5,
  match_threshold: 0.78,
});

Adjust match_threshold based on your use case. Higher values return fewer but more relevant results. Start with 0.78 and tune from there.

Building a RAG Pipeline

InsForge provides the core primitives for retrieval-augmented generation: embeddings, vector storage, similarity search, and chat completions.

Basic RAG Flow

async function askQuestion(question: string) {
  // 1. Embed the question
  const embeddingResponse = await insforge.ai.embeddings.create({
    model: 'openai/text-embedding-3-small',
    input: question,
  });

  // 2. Retrieve relevant documents
  const { data: documents } = await insforge.database.rpc('match_documents', {
    query_embedding: embeddingResponse.data[0].embedding,
    match_count: 5,
    match_threshold: 0.78,
  });

  // 3. Build context from retrieved documents
  const context = documents
    .map((doc: any) => doc.content)
    .join('\n\n');

  // 4. Generate a response with context
  const response = await insforge.ai.chat.completions.create({
    model: 'openai/gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `Answer the question based on the following context:\n\n${context}`,
      },
      {
        role: 'user',
        content: question,
      },
    ],
  });

  return response;
}

Production RAG

The basic flow above works for prototypes. For production systems, you will likely need:

Chunking strategies — split documents by semantic boundaries, not fixed token counts
Query rewriting — rephrase user questions to improve retrieval recall
Re-ranking — score retrieved chunks with a cross-encoder before passing them to the LLM
Context assembly — format and truncate retrieved chunks to fit the model’s context window
Evaluation — measure retrieval precision, answer faithfulness, and hallucination rates

We recommend pairing InsForge with an orchestration framework for production RAG:

Framework	Language	Best For
LangChain	Python / TypeScript	Full pipeline orchestration, extensive integrations
LlamaIndex	Python / TypeScript	Document indexing, structured retrieval, query engines
Haystack	Python	Modular pipelines, evaluation, production components
Vercel AI SDK	TypeScript	Streaming UI, React/Next.js integration

These frameworks integrate with any Postgres-backed vector store. Point them at your InsForge database, use insforge.ai.embeddings.create() for embedding generation, and use insforge.ai.chat.completions.create() for the generation step.

Indexing

Without an index, pgvector does exact nearest neighbor search — accurate but slow on large datasets. Add an index for faster approximate search.

HNSW (Recommended)

Faster queries, uses more memory:

create index on documents
using hnsw (embedding vector_cosine_ops);

IVFFlat

Lower memory, but create it after inserting data:

create index on documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);

Operator Classes

Match your distance operator:

Distance	Operator Class
L2	`vector_l2_ops`
Inner product	`vector_ip_ops`
Cosine	`vector_cosine_ops`

Create IVFFlat indexes after inserting initial data. IVFFlat needs representative data to build effective clusters. HNSW can be created on an empty table.

Best Practices

Match dimensions — vector dimensions must match your embedding model
Use the InsForge AI gateway — generate embeddings with insforge.ai.embeddings.create() to keep API keys out of your application code
Use RPC functions for search — keep similarity logic in Postgres rather than computing distances client-side
Normalize embeddings — use cosine distance for scores between 0 and 1
Index at scale — add indexes when you have ~10k+ vectors
Batch inserts — generate and insert embeddings in batches to respect rate limits
Use an orchestration framework for production RAG — raw retrieval works for prototypes, but production systems need chunking, re-ranking, and evaluation

Getting Started

Core Concepts

Community Showcase

Changelog

Overview

Enabling the Extension

Creating Vector Columns

Generating Embeddings with InsForge AI

Available Embedding Models

Storing Embeddings

Querying Vectors

Distance Operators

Similarity Search with RPC

Building a RAG Pipeline

Basic RAG Flow

Production RAG

Indexing

HNSW (Recommended)

IVFFlat

Operator Classes

Best Practices

Getting Started

Core Concepts

Community Showcase

Changelog

​Overview

​Enabling the Extension

​Creating Vector Columns

​Generating Embeddings with InsForge AI

​Available Embedding Models

​Storing Embeddings

​Querying Vectors

​Distance Operators

​Similarity Search with RPC

​Building a RAG Pipeline

​Basic RAG Flow

​Production RAG

​Indexing

​HNSW (Recommended)

​IVFFlat

​Operator Classes

​Best Practices

Overview

Enabling the Extension

Creating Vector Columns

Generating Embeddings with InsForge AI

Available Embedding Models

Storing Embeddings

Querying Vectors

Distance Operators

Similarity Search with RPC

Building a RAG Pipeline

Basic RAG Flow

Production RAG

Indexing

HNSW (Recommended)

IVFFlat

Operator Classes

Best Practices