Vector DB
Discover more about the Vector DB connector and how to use it on the Digibee Integration Platform.
Overview
The Vector DB connector plays a central role in your pipeline by performing the data ingestion process. It converts information into a vector representation that can later be used for semantic search and retrieval. When a prompt is received, similarity calculations identify the most relevant vectors, and their corresponding text is retrieved to enrich the context provided to the language model (LLM).
Unlike traditional databases, which store text or structured data, a vector database stores embeddings, which are numerical representations that capture the meaning of content. These embeddings make it possible for AI models to find related information based on similarity rather than exact keyword matches.
How it works
The connector’s operation involves a sequential process with three main stages:
Embedding generation
The received content is processed using the configured Embedding Model, which converts the data into a vector (a list of numbers that represents its semantic meaning). These vectors are not human-readable but are essential for AI-based search and retrieval in later stages.
Supported embedding model providers include:
- Local (default): A lightweight local embedding model (all-MiniLM-L6-v2) that is useful for basic use cases or testing. 
- External providers: You can select more advanced options, such as: - Hugging Face: Offering a variety of text and multimodal models. 
- OpenAI: Supporting models like - text-embedding-3-smalland- text-embedding-3-large.
- Google Vertex AI: Enabling enterprise-grade embedding generation. 
 
Vector dimensions
Each embedding model produces vectors with a specific dimension (for example, 3072 values). The dimension used in the model must match exactly the dimension defined in the target vector store table. If they differ, the ingestion process will fail.
When the Auto Create option is enabled, the connector automatically creates a new table with the correct vector dimension according to the selected embedding model.
Supported operations
At the current stage, the connector supports only ingestion operations.
- Insert: Stores the generated embeddings in the vector store. 
- Metadata: You can include metadata (additional key–value pairs) when storing embeddings, but metadata-based filters are not yet available. 
Output
The connector returns a confirmation message indicating the result of the ingestion process.
If supported by the embedding model, the response may also include additional information such as the number of tokens processed during embedding generation.
Parameters configuration
Configure the connector using the parameters below. Fields that support Double Braces expressions are marked in the Supports DB column.
Alias
Name (alias) for this connector’s output, allowing you to reference it later in the flow using Double Braces expressions.
String
✅
vector-db-1
Source Type
Defines the type of data that the connector will process. Supported types: Text and File.
String
❌
N/A
Metadata
Stores extra information to identify the vectors.
Key-value pairs
❌
N/A
An embedding model converts text or other types of data into numerical vectors that represent their semantic meaning. These vectors allow the system to measure similarity between pieces of content based on meaning rather than exact wording.
Embedding models are commonly used for tasks such as semantic search, clustering, and Retrieval-Augmented Generation (RAG), where they enable efficient comparison and retrieval of contextually relevant information.
Local (all-MiniLM-L6-v2)
There are no configurable parameters for this provider. However, the connector still uses an internal Vector Dimension, which defines the dimension of the embedding vectors. This dimension must exactly match the model’s vector size. If the table doesn’t exist, auto-create will use this dimension; mismatched tables will cause errors. For this provider, the default dimension is 384.
OpenAI
Embedding Model Name
Defines the name of the embedding model to use, such as text-embedding-3-large.
String
✅
N/A
Embedding Account
Specifies the account configured with OpenAI credentials. Supported type: Secret Key.
Select
❌
N/A
Vector Dimension
Sets the dimension of the embedding vectors. Must exactly match the model’s vector size. If the table doesn’t exist, auto-create uses this dimension; mismatched tables cause errors.
Integer
❌
N/A
Timeout
Defines the maximum time limit (in seconds) for the operation before it is aborted. For example, 120 equals 2 minutes.
Integer
❌
30
Google Vertex AI
Embedding Model Name
Defines the name of the embedding model to use, such as textembedding-gecko@003.
String
✅
N/A
Embedding Account
Specifies the account configured with Google Cloud credentials. Supported type: Google Key.
Select
❌
N/A
Vector Dimension
Sets the dimension of the embedding vectors. Must exactly match the model’s vector size. If the table doesn’t exist, auto-create uses this dimension; mismatched tables cause errors.
Integer
❌
N/A
Project ID
Defines the ID of the Google Cloud project associated with the account.
String
✅
N/A
Location
Specifies the region where the Vertex AI model is deployed, such as us-central1.
String
✅
N/A
Endpoint
Defines the endpoint of the embedding model, such as us-central1-aiplatform.googleapis.com:443.
String
✅
N/A
Publisher
Specifies the publisher of the model, typically google.
String
✅
N/A
Max Retries
Defines the maximum number of retry attempts in case of temporary API failures.
Integer
❌
3
Timeout
Defines the maximum time limit (in seconds) for the operation before it is aborted. For example, 120 equals 2 minutes.
Integer
❌
30
Hugging Face
Embedding Model Name
Defines the name of the embedding model to use, such as sentence-transformers/all-mpnet-base-v2.
String
✅
N/A
Embedding Account
Specifies the account configured with Hugging Face credentials. Supported types: Secret Key.
Select
❌
N/A
Vector Dimension
Sets the dimension of the embedding vectors. Must exactly match the model’s vector size. If the table doesn’t exist, auto-create uses this dimension; mismatched tables cause errors.
Integer
❌
N/A
Wait for Model
Determines whether the system should wait for the model to load before generating embeddings (true) or return an error if the model is not ready (false).
Boolean
❌
True
A vector store is a specialized database designed to store and retrieve vector representations of data (embeddings). It enables similarity searches by comparing numerical vectors instead of exact text matches, allowing more relevant and semantic results.
Vector Store Provider
Defines the database provider used for storing and querying embeddings. Options are: PostgreSQL (PGVector) and Neo4j.
Select
❌
N/A
PostgreSQL (PGVector)
Host
Defines the hostname or IP address of the PostgreSQL server.
String
✅
N/A
Port
Defines the port number used to connect to the PostgreSQL server.
Number
❌
5432
Database Name
Defines the name of the PostgreSQL database containing the vector table.
String
✅
N/A
Vector Store Account
Specifies the account configured with PostgreSQL credentials.
Select
❌
N/A
Table Name
Defines the name of the table where vectors are stored.
String
✅
N/A
Auto-Create Table
Automatically creates the table if it doesn’t exist (PGVector only).
Boolean
❌
True
Clear Table Before Ingest
Deletes all existing records before ingesting new data (PGVector only).
Boolean
❌
False
Auto-Create Index
Automatically creates the vector index if it doesn’t exist.
Boolean
❌
True
Neo4j
Database Name
Defines the name of the Neo4j database where the vector index is stored.
String
✅
N/A
Vector Store Account
Specifies the account configured with Neo4j credentials.
Select
❌
N/A
Index Name
Defines the name of the index used to store and query vectors.
String
✅
N/A
URI
Defines the connection URI to the Neo4j instance.
String
✅
N/A
Node Label
Defines the label assigned to nodes containing embedding data.
String
✅
N/A
Embedding Property
Defines the node property used to store the embedding vector.
String
✅
N/A
Text Property
Defines the node property used to store the original text or document.
String
✅
N/A
Splitting Strategy
Defines how documents are split into smaller chunks for embedding.
String
❌
Recursive Character Splitter (Recommended)
Max Segment Size
Maximum number of characters allowed per chunk. Larger values generate fewer, longer segments.
Integer
❌
500
Segment Overlap
Number of characters shared between consecutive chunks to preserve context.
Integer
❌
50
Documentation
Optional field to describe the connector configuration and any relevant business rules.
String
❌
N/A
Last updated
Was this helpful?
