Skip to main content

Overview

InsForge supports pgvector, a PostgreSQL extension for vector similarity search. Use it to build semantic search, recommendations, RAG pipelines, or anything that needs “find similar items” functionality. InsForge also provides a built-in embeddings SDK through the AI gateway, so you can generate and store vectors without configuring external providers.

Enabling the Extension

Enable pgvector via SQL:
create extension if not exists vector;
The extension is named vector in PostgreSQL, though the package is commonly called “pgvector”.

Creating Vector Columns

Create a table with a vector column. The dimension must match your embedding model:
create table documents (
  id bigserial primary key,
  content text,
  embedding vector(1536),
  created_at timestamptz default now()
);

Generating Embeddings with InsForge AI

Use the InsForge SDK to generate embeddings directly through the AI gateway. No external API keys are needed in your application code.
const response = await insforge.ai.embeddings.create({
  model: 'openai/text-embedding-3-small',
  input: 'Your text here',
});

console.log(response.data[0].embedding);
The method accepts a single string or an array of strings. Parameters:
ParameterTypeRequiredDescription
modelstringYesEmbedding model identifier
inputstring or string[]YesText to embed
encoding_formatstringNofloat (default) or base64
dimensionsnumberNoOutput dimensionality override

Available Embedding Models

ModelDimensions
openai/text-embedding-3-small1536
openai/text-embedding-3-large3072
openai/text-embedding-ada-0021536
google/gemini-embedding-001768
You can also use any external embedding provider (OpenAI, Cohere, Hugging Face, etc.) and store the resulting vectors in pgvector. The InsForge AI gateway is optional.

Storing Embeddings

Generate an embedding and insert it in one step:
async function storeDocument(content: string) {
  const response = await insforge.ai.embeddings.create({
    model: 'openai/text-embedding-3-small',
    input: content,
  });

  const { data, error } = await insforge.database
    .from('documents')
    .insert([{
      content,
      embedding: response.data[0].embedding,
    }])
    .select();

  return { data, error };
}

Querying Vectors

Use distance operators to find similar vectors:
select * from documents
order by embedding <=> '[0.1, 0.2, ...]'
limit 5;

Distance Operators

OperatorDescription
<->L2 distance
<#>Inner product (negative)
<=>Cosine distance
For normalized embeddings (like OpenAI’s), use cosine distance <=>. Similarity = 1 - distance.

Similarity Search with RPC

For production use, create a Postgres function that handles similarity search server-side. This avoids round trips and keeps the logic close to the data.
create or replace function match_documents(
  query_embedding vector(1536),
  match_count int default 5,
  match_threshold float default 0.78
)
returns table (
  id bigint,
  content text,
  similarity float
)
language sql stable
as $$
  select
    id,
    content,
    1 - (embedding <=> query_embedding) as similarity
  from documents
  where 1 - (embedding <=> query_embedding) > match_threshold
  order by embedding <=> query_embedding
  limit match_count;
$$;
Call it from the SDK:
const queryResponse = await insforge.ai.embeddings.create({
  model: 'openai/text-embedding-3-small',
  input: 'What is machine learning?',
});

const { data, error } = await insforge.database.rpc('match_documents', {
  query_embedding: queryResponse.data[0].embedding,
  match_count: 5,
  match_threshold: 0.78,
});
Adjust match_threshold based on your use case. Higher values return fewer but more relevant results. Start with 0.78 and tune from there.

Building a RAG Pipeline

InsForge provides the core primitives for retrieval-augmented generation: embeddings, vector storage, similarity search, and chat completions.

Basic RAG Flow

async function askQuestion(question: string) {
  // 1. Embed the question
  const embeddingResponse = await insforge.ai.embeddings.create({
    model: 'openai/text-embedding-3-small',
    input: question,
  });

  // 2. Retrieve relevant documents
  const { data: documents } = await insforge.database.rpc('match_documents', {
    query_embedding: embeddingResponse.data[0].embedding,
    match_count: 5,
    match_threshold: 0.78,
  });

  // 3. Build context from retrieved documents
  const context = documents
    .map((doc: any) => doc.content)
    .join('\n\n');

  // 4. Generate a response with context
  const response = await insforge.ai.chat.completions.create({
    model: 'openai/gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `Answer the question based on the following context:\n\n${context}`,
      },
      {
        role: 'user',
        content: question,
      },
    ],
  });

  return response;
}

Production RAG

The basic flow above works for prototypes. For production systems, you will likely need:
  • Chunking strategies — split documents by semantic boundaries, not fixed token counts
  • Query rewriting — rephrase user questions to improve retrieval recall
  • Re-ranking — score retrieved chunks with a cross-encoder before passing them to the LLM
  • Context assembly — format and truncate retrieved chunks to fit the model’s context window
  • Evaluation — measure retrieval precision, answer faithfulness, and hallucination rates
We recommend pairing InsForge with an orchestration framework for production RAG:
FrameworkLanguageBest For
LangChainPython / TypeScriptFull pipeline orchestration, extensive integrations
LlamaIndexPython / TypeScriptDocument indexing, structured retrieval, query engines
HaystackPythonModular pipelines, evaluation, production components
Vercel AI SDKTypeScriptStreaming UI, React/Next.js integration
These frameworks integrate with any Postgres-backed vector store. Point them at your InsForge database, use insforge.ai.embeddings.create() for embedding generation, and use insforge.ai.chat.completions.create() for the generation step.

Indexing

Without an index, pgvector does exact nearest neighbor search — accurate but slow on large datasets. Add an index for faster approximate search. Faster queries, uses more memory:
create index on documents
using hnsw (embedding vector_cosine_ops);

IVFFlat

Lower memory, but create it after inserting data:
create index on documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);

Operator Classes

Match your distance operator:
DistanceOperator Class
L2vector_l2_ops
Inner productvector_ip_ops
Cosinevector_cosine_ops
Create IVFFlat indexes after inserting initial data. IVFFlat needs representative data to build effective clusters. HNSW can be created on an empty table.

Best Practices

  • Match dimensions — vector dimensions must match your embedding model
  • Use the InsForge AI gateway — generate embeddings with insforge.ai.embeddings.create() to keep API keys out of your application code
  • Use RPC functions for search — keep similarity logic in Postgres rather than computing distances client-side
  • Normalize embeddings — use cosine distance for scores between 0 and 1
  • Index at scale — add indexes when you have ~10k+ vectors
  • Batch inserts — generate and insert embeddings in batches to respect rate limits
  • Use an orchestration framework for production RAG — raw retrieval works for prototypes, but production systems need chunking, re-ranking, and evaluation