Vector DB

Discover more about the Vector DB connector and how to use it on the Digibee Integration Platform.

Overview

The Vector DB connector plays a central role in your pipeline by performing the data ingestion process. It converts information into a vector representation that can later be used for semantic search and retrieval. When a prompt is received, similarity calculations identify the most relevant vectors, and their corresponding text is retrieved to enrich the context provided to the language model (LLM).

Unlike traditional databases, which store text or structured data, a vector database stores embeddings, which are numerical representations that capture the meaning of content. These embeddings make it possible for AI models to find related information based on similarity rather than exact keyword matches.

How it works

The connector’s operation involves a sequential process with three main stages:

Data ingestion

The connector receives data from a previous pipeline step. This data can come from various sources, such as the trigger or other Platform connector. You can define the source type through the Source Type parameter: Text: To process raw text content. File: To process a stored document.

Embedding generation

The received content is processed using the configured Embedding Model, which converts the data into a vector (a list of numbers that represents its semantic meaning). These vectors are not human-readable but are essential for AI-based search and retrieval in later stages.

Supported embedding model providers include:

Local (default): A lightweight local embedding model (all-MiniLM-L6-v2) that is useful for basic use cases or testing.
External providers: You can select more advanced options, such as:
- Hugging Face: Offering a variety of text and multimodal models.
- OpenAI: Supporting models like text-embedding-3-small and text-embedding-3-large.
- Google Vertex AI: Enabling enterprise-grade embedding generation.

Vector storage

After the embeddings are generated, they are stored in the configured Vector Store. Currently, the connector supports:

Neo4j (graph-based database).
Postgres-compatible databases.

Vector dimensions

Each embedding model produces vectors with a specific dimension (for example, 3072 values). The dimension used in the model must match exactly the dimension defined in the target vector store table. If they differ, the ingestion process will fail.

When the Auto Create option is enabled, the connector automatically creates a new table with the correct vector dimension according to the selected embedding model.

Supported operations

At the current stage, the connector supports only ingestion operations.

Insert: Stores the generated embeddings in the vector store.
Metadata: You can include metadata (additional key–value pairs) when storing embeddings, but metadata-based filters are not yet available.

Output

The connector returns a confirmation message indicating the result of the ingestion process.

If supported by the embedding model, the response may also include additional information such as the number of tokens processed during embedding generation.

Parameters configuration

Configure the connector using the parameters below. Fields that support Double Braces expressions are marked in the Supports DB column.

Parameter

Description

Type

Supports DB

Default

Alias

Name (alias) for this connector’s output, allowing you to reference it later in the flow using Double Braces expressions.

String

✅

vector-db-1

Source Type

Defines the type of data that the connector will process. Supported types: Text and File.

String

❌

N/A

Metadata

Stores extra information to identify the vectors.

Key-value pairs

❌

N/A

An embedding model converts text or other types of data into numerical vectors that represent their semantic meaning. These vectors allow the system to measure similarity between pieces of content based on meaning rather than exact wording.

Embedding models are commonly used for tasks such as semantic search, clustering, and Retrieval-Augmented Generation (RAG), where they enable efficient comparison and retrieval of contextually relevant information.

Local (all-MiniLM-L6-v2)

There are no configurable parameters for this provider. However, the connector still uses an internal Vector Dimension, which defines the dimension of the embedding vectors. This dimension must exactly match the model’s vector size. If the table doesn’t exist, auto-create will use this dimension; mismatched tables will cause errors. For this provider, the default dimension is 384.

OpenAI

Parameter

Description

Data type

Supports DB

Default

Embedding Model Name

Defines the name of the embedding model to use, such as text-embedding-3-large.

String

✅

N/A

Embedding Account

Specifies the account configured with OpenAI credentials. Supported type: Secret Key.

Select

❌

N/A

Vector Dimension

Sets the dimension of the embedding vectors. Must exactly match the model’s vector size. If the table doesn’t exist, auto-create uses this dimension; mismatched tables cause errors.

Integer

❌

N/A

Timeout

Defines the maximum time limit (in seconds) for the operation before it is aborted. For example, 120 equals 2 minutes.

Integer

❌

30

Google Vertex AI

Parameter

Description

Data type

Supports DB

Default

Embedding Model Name

Defines the name of the embedding model to use, such as textembedding-gecko@003.

String

✅

N/A

Embedding Account

Specifies the account configured with Google Cloud credentials. Supported type: Google Key.

Select

❌

N/A

Vector Dimension

Sets the dimension of the embedding vectors. Must exactly match the model’s vector size. If the table doesn’t exist, auto-create uses this dimension; mismatched tables cause errors.

Integer

❌

N/A

Project ID

Defines the ID of the Google Cloud project associated with the account.

String

✅

N/A

Location

Specifies the region where the Vertex AI model is deployed, such as us-central1.

String

✅

N/A

Endpoint

Defines the endpoint of the embedding model, such as us-central1-aiplatform.googleapis.com:443.

String

✅

N/A

Publisher

Specifies the publisher of the model, typically google.

String

✅

N/A

Max Retries

Defines the maximum number of retry attempts in case of temporary API failures.

Integer

❌

Timeout

Defines the maximum time limit (in seconds) for the operation before it is aborted. For example, 120 equals 2 minutes.

Integer

❌

30

Hugging Face

Parameter

Description

Data type

Supports DB

Default

Embedding Model Name

Defines the name of the embedding model to use, such as sentence-transformers/all-mpnet-base-v2.

String

✅

N/A

Embedding Account

Specifies the account configured with Hugging Face credentials. Supported types: Secret Key.

Select

❌

N/A

Vector Dimension

Sets the dimension of the embedding vectors. Must exactly match the model’s vector size. If the table doesn’t exist, auto-create uses this dimension; mismatched tables cause errors.

Integer

❌

N/A

Wait for Model

Determines whether the system should wait for the model to load before generating embeddings (true) or return an error if the model is not ready (false).

Boolean

❌

True

PreviousAgent Component NextAWS

Last updated 21 days ago

Was this helpful?

Overview

How it works

Data ingestion

Embedding generation

Vector storage

Vector dimensions

Supported operations

Output

Parameters configuration

Local (all-MiniLM-L6-v2)

OpenAI

Google Vertex AI

Hugging Face

PostgreSQL (PGVector)

Neo4j