Vector DB

Discover more about the Vector DB connector and how to use it on the Digibee Integration Platform.

Overview

The Vector DB connector plays a central role in your pipeline by performing the data ingestion process. It converts information into a vector representation that can later be used for semantic search and retrieval. When a prompt is received, similarity calculations identify the most relevant vectors, and their corresponding text is retrieved to enrich the context provided to the language model (LLM).

Unlike traditional databases, which store text or structured data, a vector database stores embeddings, which are numerical representations that capture the meaning of content. These embeddings make it possible for AI models to find related information based on similarity rather than exact keyword matches.

How it works

The connector’s operation involves a sequential process with three main stages:

1

Data ingestion

The connector receives data from a previous pipeline step. This data can come from various sources, such as the trigger or other Platform connector. You can define the source type through the Source Type parameter: Text: To process raw text content. File: To process a stored document.

2

Embedding generation

The received content is processed using the configured Embedding Model, which converts the data into a vector (a list of numbers that represents its semantic meaning). These vectors are not human-readable but are essential for AI-based search and retrieval in later stages.

Supported embedding model providers include:

  • Local (default): A lightweight local embedding model (all-MiniLM-L6-v2) that is useful for basic use cases or testing.

  • External providers: You can select more advanced options, such as:

    • Hugging Face: Offering a variety of text and multimodal models.

    • OpenAI: Supporting models like text-embedding-3-small and text-embedding-3-large.

    • Google Vertex AI: Enabling enterprise-grade embedding generation.

3

Vector storage

After the embeddings are generated, they are stored in the configured Vector Store. Currently, the connector supports:

  • Neo4j (graph-based database).

  • Postgres-compatible databases.

Vector dimensions

Each embedding model produces vectors with a specific dimension (for example, 3072 values). The dimension used in the model must match exactly the dimension defined in the target vector store table. If they differ, the ingestion process will fail.

When the Auto Create option is enabled, the connector automatically creates a new table with the correct vector dimension according to the selected embedding model.

Supported operations

At the current stage, the connector supports only ingestion operations.

  • Insert: Stores the generated embeddings in the vector store.

  • Metadata: You can include metadata (additional key–value pairs) when storing embeddings, but metadata-based filters are not yet available.

Output

The connector returns a confirmation message indicating the result of the ingestion process.

If supported by the embedding model, the response may also include additional information such as the number of tokens processed during embedding generation.

Parameters configuration

Configure the connector using the parameters below. Fields that support Double Braces expressions are marked in the Supports DB column.

Parameter
Description
Type
Supports DB
Default

Alias

Name (alias) for this connector’s output, allowing you to reference it later in the flow using Double Braces expressions.

String

vector-db-1

Source Type

Defines the type of data that the connector will process. Supported types: Text and File.

String

N/A

Metadata

Stores extra information to identify the vectors.

Key-value pairs

N/A

Last updated

Was this helpful?