ObjectStackObjectStack

Rag Pipeline

Rag Pipeline protocol schemas

RAG (Retrieval-Augmented Generation) Pipeline Protocol

Defines schemas for building context-aware AI assistants using RAG techniques.

Enables vector search, document chunking, embeddings, and retrieval configuration.

Source: packages/spec/src/ai/rag-pipeline.zod.ts

TypeScript Usage

import { ChunkingStrategy, DocumentChunk, DocumentLoaderConfig, DocumentMetadata, EmbeddingModel, FilterExpression, FilterGroup, MetadataFilter, RAGPipelineConfig, RAGPipelineStatus, RAGQueryRequest, RAGQueryResponse, RerankingConfig, RetrievalStrategy, VectorStoreConfig, VectorStoreProvider } from '@objectstack/spec/ai';
import type { ChunkingStrategy, DocumentChunk, DocumentLoaderConfig, DocumentMetadata, EmbeddingModel, FilterExpression, FilterGroup, MetadataFilter, RAGPipelineConfig, RAGPipelineStatus, RAGQueryRequest, RAGQueryResponse, RerankingConfig, RetrievalStrategy, VectorStoreConfig, VectorStoreProvider } from '@objectstack/spec/ai';

// Validate data
const result = ChunkingStrategy.parse(data);

ChunkingStrategy

Union Options

This schema accepts one of the following structures:

Option 1

Type: fixed

Properties

PropertyTypeRequiredDescription
typestring
chunkSizeintegerFixed chunk size in tokens/chars
chunkOverlapintegerOverlap between chunks
unitEnum<'tokens' | 'characters'>

Option 2

Type: semantic

Properties

PropertyTypeRequiredDescription
typestring
modelstringoptionalModel for semantic chunking
minChunkSizeinteger
maxChunkSizeinteger

Option 3

Type: recursive

Properties

PropertyTypeRequiredDescription
typestring
separatorsstring[]
chunkSizeinteger
chunkOverlapinteger

Option 4

Type: markdown

Properties

PropertyTypeRequiredDescription
typestring
maxChunkSizeinteger
respectHeadersbooleanKeep headers with content
respectCodeBlocksbooleanKeep code blocks intact


DocumentChunk

Properties

PropertyTypeRequiredDescription
idstringUnique chunk identifier
contentstringChunk text content
embeddingnumber[]optionalEmbedding vector
metadataObject
chunkIndexintegerChunk position in document
tokensintegeroptionalToken count

DocumentLoaderConfig

Properties

PropertyTypeRequiredDescription
typeEnum<'file' | 'directory' | 'url' | 'api' | 'database' | 'custom'>
sourcestringSource path, URL, or identifier
fileTypesstring[]optionalAccepted file extensions (e.g., [".pdf", ".md"])
recursivebooleanProcess directories recursively
maxFileSizeintegeroptionalMaximum file size in bytes
excludePatternsstring[]optionalPatterns to exclude
extractImagesbooleanExtract text from images (OCR)
extractTablesbooleanExtract and format tables
loaderConfigRecord<string, any>optionalCustom loader-specific config

DocumentMetadata

Properties

PropertyTypeRequiredDescription
sourcestringDocument source (file path, URL, etc.)
sourceTypeEnum<'file' | 'url' | 'api' | 'database' | 'custom'>optional
titlestringoptional
authorstringoptionalDocument author
createdAtstringoptionalISO timestamp
updatedAtstringoptionalISO timestamp
tagsstring[]optional
categorystringoptional
languagestringoptionalDocument language (ISO 639-1 code)
customRecord<string, any>optionalCustom metadata fields

EmbeddingModel

Properties

PropertyTypeRequiredDescription
providerEnum<'openai' | 'cohere' | 'huggingface' | 'azure_openai' | 'local' | 'custom'>
modelstringModel name (e.g., "text-embedding-3-large")
dimensionsintegerEmbedding vector dimensions
maxTokensintegeroptionalMaximum tokens per embedding
batchSizeintegerBatch size for embedding
endpointstringoptionalCustom endpoint URL
apiKeystringoptionalAPI key
secretRefstringoptionalReference to stored secret

FilterExpression

Properties

PropertyTypeRequiredDescription
fieldstringMetadata field to filter
operatorEnum<'eq' | 'neq' | 'gt' | 'gte' | 'lt' | 'lte' | 'in' | 'nin' | 'contains'>
valuestring | number | boolean | string | number[]Filter value

FilterGroup

Properties

PropertyTypeRequiredDescription
logicEnum<'and' | 'or'>
filtersObject | [#](./#)[]

MetadataFilter

Union Options

This schema accepts one of the following structures:

Option 1

Properties

PropertyTypeRequiredDescription
fieldstringMetadata field to filter
operatorEnum<'eq' | 'neq' | 'gt' | 'gte' | 'lt' | 'lte' | 'in' | 'nin' | 'contains'>
valuestring | number | boolean | string | number[]Filter value

Option 2

Reference: __schema0


Option 3

Type: Record<string, string | number | boolean | string | number[]>



RAGPipelineConfig

Properties

PropertyTypeRequiredDescription
namestringPipeline name (snake_case)
labelstringDisplay name
descriptionstringoptional
embeddingObject
vectorStoreObject
chunkingObject | Object | Object | Object
retrievalObject | Object | Object | Object
rerankingObjectoptional
loadersObject[]optionalDocument loaders
maxContextTokensintegerMaximum tokens in context
contextWindowintegeroptionalLLM context window size
metadataFiltersObject | [__schema0](./__schema0) | Record<string, string | number | boolean | string | number[]>optionalGlobal filters for retrieval
enableCacheboolean
cacheTTLintegerCache TTL in seconds
cacheInvalidationStrategyEnum<'time_based' | 'manual' | 'on_update'>optional

RAGPipelineStatus

Properties

PropertyTypeRequiredDescription
namestring
statusEnum<'active' | 'indexing' | 'error' | 'disabled'>
documentsIndexedinteger
lastIndexedstringoptionalISO timestamp
errorMessagestringoptional
healthObjectoptional

RAGQueryRequest

Properties

PropertyTypeRequiredDescription
querystringUser query
pipelineNamestringPipeline to use
topKintegeroptional
metadataFiltersRecord<string, any>optional
conversationHistoryObject[]optional
includeMetadataboolean
includeSourcesboolean

RAGQueryResponse

Properties

PropertyTypeRequiredDescription
querystring
resultsObject[]
contextstringAssembled context for LLM
tokensObjectoptionalToken usage for this query
costnumberoptionalCost for this query in USD
retrievalTimenumberoptionalRetrieval time in milliseconds

RerankingConfig

Properties

PropertyTypeRequiredDescription
enabledboolean
modelstringoptionalReranking model name
providerEnum<'cohere' | 'huggingface' | 'custom'>optional
topKintegerFinal number of results after reranking

RetrievalStrategy

Union Options

This schema accepts one of the following structures:

Option 1

Type: similarity

Properties

PropertyTypeRequiredDescription
typestring
topKintegerNumber of results to retrieve
scoreThresholdnumberoptionalMinimum similarity score

Option 2

Type: mmr

Properties

PropertyTypeRequiredDescription
typestring
topKinteger
fetchKintegerInitial fetch size
lambdanumberDiversity vs relevance (0=diverse, 1=relevant)

Option 3

Type: hybrid

Properties

PropertyTypeRequiredDescription
typestring
topKinteger
vectorWeightnumberWeight for vector search
keywordWeightnumberWeight for keyword search

Option 4

Type: parent_document

Properties

PropertyTypeRequiredDescription
typestring
topKinteger
retrieveParentbooleanRetrieve full parent document


VectorStoreConfig

Properties

PropertyTypeRequiredDescription
providerEnum<'pinecone' | 'weaviate' | 'qdrant' | 'milvus' | 'chroma' | 'pgvector' | 'redis' | 'opensearch' | 'elasticsearch' | 'custom'>
indexNamestringIndex/collection name
namespacestringoptionalNamespace for multi-tenancy
hoststringoptionalVector store host
portintegeroptionalVector store port
secretRefstringoptionalReference to stored secret
apiKeystringoptionalAPI key or reference to secret
dimensionsintegerVector dimensions
metricEnum<'cosine' | 'euclidean' | 'dotproduct'>
batchSizeinteger
connectionPoolSizeinteger
timeoutintegerTimeout in milliseconds

VectorStoreProvider

Allowed Values

  • pinecone
  • weaviate
  • qdrant
  • milvus
  • chroma
  • pgvector
  • redis
  • opensearch
  • elasticsearch
  • custom

On this page