Stream DB V3

Discover more about the Stream DB V3 component and how to use it on the Digibee Integration Platform.

Stream DB V3 allows the establishment of a connection with a service that supports the JDBC (Java Database Connectivity) protocol and the execution of the SQL (Structured Query Language) instructions. To see all the databases supported by this component, read the Supported databases documentation.

Differently from the DB V1 component, Stream DB has been designed to make execution in batches, which means, each return (resulting line or row) of the executed SQL instruction is individually treated through a subpipeline, being able to have its own processing flow. Read the Subpipelines article to learn more.

Parameters

Take a look at the configuration parameters of the component. Parameters supported by Double Braces expressions are marked with (DB).

ParameterDescriptionDefault valueData type

Account

For the component to make the authentication to a JDBC service, it's necessary to use a Basic or Kerberos-type account (check the topic Authentication via Kerberos).

N/A

String

Database URL (DB)

URL (Uniform Resource Locator) to establish connection to the database server with support to the JDBC protocol. This parameter supports Double Braces (DB).

jdbc:mysql://35.223.175.97/db-training

String

SQL Statement

SQL instruction to be executed.

select * from clientes LIMIT 2

String

Column Name

If an error occurs while processing the onProcess subpipeline, the value associated with the column defined in this field will be added to the error message in a new field called "processedId" that can be handled by the onException subpipeline.

codigo

String

Parallel Execution Of Each Iteration

When activated, this option causes each one of the passes through the pipeline to be made in parallel, reducing the total execution time. However, there's no guarantee that the items will be executed in the order returned by the database.

False

Boolean

Blob As File

If activated, this option causes the blob-type field to be stored in the pipeline context as files; otherwise, the fields are stored as normal texts (strings) and coded in base64.

False

Boolean

Clob As File

If activated, this option causes the clob-type field to be stored in the pipeline context as files; otherwise, the fields are stored as normal texts (strings).

False

Boolean

Charset

This option will only be shown in case the Clob As File option is activated. This parameter allows you to set the Clob file encoding.

UTF-8

String

Fail On Error

When activated, this parameter suspends the pipeline execution only if there’s a severe occurrence in the iteration structure, disabling its complete conclusion. The Fail On Error parameter activation doesn’t have any connection with the errors occurred in the components used for the construction of the subpipelines (onProcess and onException).

False

Boolean

Custom Connection Properties

Specific connection properties defined by the user.

N/A

String

Keep Connections

If activated, the option will keep the connection with the database for a maximum of 30 minutes; otherwise, it will be for 5 minutes only.

True

Boolean

Advanced

If activated, the following configurations will be available:

False

Boolean

Pool Size By Actual Consumers

If the option is activated, the number of pooled connections will be equivalent to the number of consumers configured during pipeline deployment. Otherwise, the pool size is given by the pipeline deployment size, irrespective of the number of consumers.

False

Boolean

Exclusive DB Pool

If the option is activated, a new non-shared pool will always be created for the sole use of this component. Otherwise, a pool might be shared between components if the URL is the same.

False

Boolean

Output Column From Label

For some databases, it's important to keep this option activated if your SELECT is using any alias because that way you guarantee the name of the column will be shown exactly like the configured alias.

False

Boolean

Connection Test Query

SQL instruction to be executed before each connection is established (i.e., select 1 from dual). This parameter is optional and must be applied only to databases that don't have reliable information about the connection status.

N/A

String

Raw SQL Statement (DB)

If the option is active, the SQL Statement parameter allows the use of dynamic queries through Double Braces statements. When using this functionality, you must ensure that the pipeline has security measures against unwanted SQL statements (SQL Injection). See more about this parameter in the section below.

False

Boolean

In situations where an Apache Hive database is used, the Updatecount data may be unavailable due to a system behavior. This information will be available only if the updated row control is enabled on the Apache Hive server. To learn more about Apache Hive support for the Digibee Integration Platform, refer to Supported databases.

Parameters additional information

Column Name

See the following example about Column Name error message:

{
  "timestamp": 1600797662733,
  "error": "Error message",
  "code": 500,
  "processedId": "2"
}

Blob As File

If Blob As File is active, the blob-type field is stored as follows:

// "Blob As File" true
{
  "id": 12,
  "blob": "d3X8YK.file",
}

Otherwise, the blob-type field is stored as follows:


// "Blob As File" false
{
  "id": 12,
  "blob": "iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAAeSURBVDhPY1Da6EMSYiBJNVDxqAZiQmw0lAZHKAEAaskfEED3lr0AAAAASUVORK5CYII="
}

Clob As File

If Clob As File is active, the clob-type field is stored as follows:

// "Clob As File" true
{
  "id": 15,
  "clob": "f7X9AS.file",
}

Otherwise, the clob-type field is stored as follows:

// "Clob As File" false
{
  "id": 15,
  "clob": "AAAAABBBBBCCCC”
}

Raw SQL Statement

To bring more flexibility when using Stream DB V3, we can activate the Raw SQL Statement option, previously configure a query, and reference it through Double Braces in the SQL Statement parameter as follows:

Query previously defined via Template Transformer

Raw SQL Statement activation

Query referenced in SQL Statement parameter

Important: as a good practice, we strongly recommend that, when activating the Raw SQL Statement option, the queries are previously defined using the Template Transformer component. The use of Template Transformer allows validating parameters through FreeMarker technology and also parameter declaration through Double Braces. These parameters are not resolved by Template Transformer but by the component Stream DB V3, which by default configures and validates the parameters of the SQL statement beforehand (PreparedStatement). By applying these security measures, you reduce the risks of attacks like SQL Injection.

In the image below, we have an example of recommended use of the component on the left (with the Double Braces in the WHERE clause, highlighted in green); and another example of non-recommended use on the right (with the FreeMarker in the WHERE clause, highlighted in red), which may pose risks to pipeline safety:

Technology

Authentication via Kerberos

To make an authentication to a database via Kerberos, it's necessary to:

  • inform a KERBEROS-type account;

  • set a main Kerberos principal;

  • set a keytab (that must be the base64 of the own generated keytab file).

Messages flow

Message structure available in the onProcess subpipeline

Once the SQL instruction is executed, the subpipeline will be triggered receiving the execution result through a message in the following structure:

{
    "column1": "data1",
    "column2": "data2",
    "column3": "data3"
}

Output with error

{
    "code": error_code,"error": error message,
    "processId": the_id_column_value
}

Output

After the component execution, a message is returned in the following structure:

{
    "total": 0,
    "success": 0,
    "failed": 0
}
  • total: total number of processed lines.

  • success: total number of successfully processed lines.

  • failed: total number of lines whose processing failed.

Important: to detect if a line has been correctly processed, each onProcess subpipeline must respond with { "success": true } to each processed element.

Stream DB V3 makes batch processing. Refer to the Batch processing article to learn more about this concept.

Connection pool

By standard, we use a pool based on the configurations of the deployed pipeline. If it's a Small pipeline, then the pool size will be 10; for the Medium it will be 20, and for the Large 40.

It's possible to manage the pool size during the implantation as well. For that, it will be necessary to enable the Pool Size By Actual Consumers parameter in the component. With it, it will be used whatever is manually configured in the implantation screen.

In the example of the image above, a SMALL pipeline with 5 consumers was configured. If you want the pool of the database components (DB V2 and Stream DB V3) to use this size, it's necessary to enable the Pool Size By Actual Consumers parameter in all the existing components.

Be extra careful when configuring the pool size manually so there's no deadlock in concurrent calls to the same database.

Our pool is shared between the database components that access the same database inside the pipeline. If you need an exclusive pool for a determined component, enable the Exclusive DB Pool parameter.

Last updated