Links

Stream DB V3

Stream DB V3 allows a connection with a service that supports the JDBC protocol and the execution of SQL instructions. Learn configuration parameters of this component.
Stream DB V3 allows the establishment of a connection with a service that supports the JDBC (Java Database Connectivity) protocol and the execution of the SQL (Structured Query Language) instructions.
Differently from the DB V1 component, Stream DB has been designed to make execution in batches, which means, each return (resulting line or row) of the executed SQL instruction is individually treated through a subpipeline, being able to have its own processing flow. Read the Subpipelines article to learn more.
IMPORTANT: in situations where an Apache Hive database is used, the Updatecount data may be unavailable due to a system behavior. This information will be available only if the updated row control is enabled on the Apache Hive server. To learn more about Apache Hive support for the Digibee Integration Platform, refer to Supported databases.
Take a look at the configuration parameters of the component:
  • Account: for the component to make the authentication to a JDBC service, it's necessary to use a BASIC or KERBEROS-type account (check the topic Authentication via Kerberos).
  • Database URL: URL (Uniform Resource Locator) to establish connection to the database server with support to the JDBC protocol. This parameter supports Double Braces.
  • SQL Statement: SQL instruction to be executed.
  • Column Name: if an error occurs while processing the onProcess subpipeline, the value associated with the column defined in this field will be added to the error message in a new field called "processedId" that can be handled by the onException subpipeline. See the following example:
{
"timestamp": 1600797662733,
"error": "Error message",
"code": 500,
"processedId": "2"
}
  • Parallel Execution Of Each Iteration: when activated, this option causes each one of the passes through the pipeline to be made in parallel, reducing the total execution time. However, there's no guarantee that the items will be executed in the order returned by the database.
  • Blob As File: if activated, this option causes the blob-type field to be stored in the pipeline context as files; otherwise, the fields are stored as normal texts (strings) and coded in base64, as follows:
// "Blob As File" true
{
"id": 12,
"blob": "d3X8YK.file",
}
// "Blob As File" false
{
"id": 12,
"blob": "iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAAeSURBVDhPY1Da6EMSYiBJNVDxqAZiQmw0lAZHKAEAaskfEED3lr0AAAAASUVORK5CYII="
}
  • Clob As File: if activated, this option causes the clob-type field to be stored in the pipeline context as files; otherwise, the fields are stored as normal texts (strings), as follows:
    // "Clob As File" true
    {
    "id": 15,
    "clob": "f7X9AS.file",
    }
    // "Clob As File" false
    {
    "id": 15,
    "clob": "AAAAABBBBBCCCC”
    }
  • Charset: this option will only be shown in case the Clob As File option is activated. This parameter allows you to set the Clob file encoding.
  • Fail On Error: when activated, this parameter suspends the pipeline execution only if there’s a severe occurrence in the iteration structure, disabling its complete conclusion. The Fail On Error parameter activation doesn’t have any connection with the errors occurred in the components used for the construction of the subpipelines (onProcess and onException).
  • Custom Connection Properties: specific connection properties defined by the user.
  • Keep Connections: if activated, the option will keep the connection with the database for a maximum of 30 minutes; otherwise, it will be for 5 minutes only.
  • Advanced: if activated, the following configurations will be available:
  • Pool Size By Actual Consumers: if the option is activated, the number of pooled connections will be equivalent to the number of consumers configured during pipeline deployment. Otherwise, the pool size is given by the pipeline deployment size, irrespective of the number of consumers.
  • Exclusive DB Pool: if the option is activated, a new non-shared pool will always be created for the sole use of this component. Otherwise, a pool might be shared between components if the URL is the same.
  • Output Column From Label: for some databases, it's important to keep this option activated if your SELECT is using any alias, because that way you guarantee the name of the column will be shown exactly like the configured alias.
  • Connection Test Query: SQL instruction to be executed before each connection is established (i.e. select 1 from dual). This parameter is optional and must be applied only to databases that don't have reliable information about the connection status.
  • Raw SQL Statement: if the option is active, the SQL Statement parameter allows the use of dynamic queries through Double Braces statements. When using this functionality, you must ensure that the pipeline has security measures against unwanted SQL statements (SQL Injection). See more about this parameter in the section below.

Raw SQL Statement

To bring more flexibility when using Stream DB V3, we can activate the Raw SQL Statement option, previously configure a query, and reference it through Double Braces in the SQL Statement parameter as follows:

Query previously defined via Template Transformer

Raw SQL Statement activation

Query referenced in SQL Statement parameter

IMPORTANT: as a good practice, we strongly recommend that, when activating the Raw SQL Statement option, the queries are previously defined using the Template Transformer component. The use of Template Transformer allows validating parameters through FreeMarker technology and also parameter declaration through Double Braces. These parameters are not resolved by Template Transformer but by the component Stream DB V3, which by default configures and validates the parameters of the SQL statement beforehand (PreparedStatement). By applying these security measures, you reduce the risks of attacks like SQL Injection.
In the image below, we have an example of recommended use of the component on the left (with the Double Braces in the WHERE clause, highlighted in green); and another example of non-recommended use on the right (with the FreeMarker in the WHERE clause, highlighted in red), which may pose risks to pipeline safety:

Technology

Authentication via Kerberos

To make an authentication to a database via Kerberos, it's necessary to:
  • inform a KERBEROS-type account
  • set a main Kerberos principal
  • set a keytab (that must be the base64 of the own generated keytab file)

Messages flow

Message structure available in the onProcess subpipeline

Once the SQL instruction is executed, the subpipeline will be triggered receiving the execution result through a message in the following structure:
{
"column1": "data1",
"column2": "data2",
"column3": "data3"
}

Output with error

{
"code": error_code,"error": error message,
"processId": the_id_column_value
}

Output

After the component execution, a message is returned in the following structure:
{
"total": 0,
"success": 0,
"failed": 0
}
  • total: total number of processed lines
  • success: total number of successfully processed lines
  • failed: total number of lines whose processing failed
IMPORTANT: to detect if a line has been correctly processed, each onProcess subpipeline must respond with { "success": true } to each processed element.
Stream DB V3 makes batch processing. Refer to the Batch processing article to learn more about this concept.

Connection pool

By standard, we use a pool based on the configurations of the deployed pipeline. If it's a SMALL pipeline, then the pool size will be 10; for the MEDIUM it will be 20, and for the LARGE 40.
It's possible to manage the pool size during the implantation as well. For that, it will be necessary to enable the Pool Size By Actual Consumers parameter in the component. With it, it will be used whatever is manually configured in the implantation screen.
In the example of the image above, a SMALL pipeline with 5 consumers was configured. If you want the pool of the database components (DB V2 and Stream DB V3) to use this size, it's necessary to enable the Pool Size By Actual Consumers parameter in all the existing components.
Be extra careful when configuring the pool size manually so there's no deadlock in concurrent calls to the same database.
Our pool is shared between the database components that access the same database inside the pipeline. If you need an exclusive pool for a determined component, enable the Exclusive DB Pool parameter.