Stream Parquet File Reader

Learn more about the Stream Parquet File Reader connector and how to use it in the Digibee Integration Platform.

The Stream Parquet File Reader connector allows you to read Parquet files triggering subpipelines to process each message individually. This connector should be used for large files.

Parquet is a columnar file format designed for efficient data storage and retrieval. For more information, see the official websitearrow-up-right.

Parameters

Configure the connector using the parameters below. Fields that support Double Braces expressions are marked in the Supports DB column.

Parameter
Description
Type
Supports DB
Default

Alias

Name (alias) for this connector’s output, allowing you to reference it later in the flow using Double Braces expressions.

String

stream-parquet-reader-1

File Name

The file name of the Parquet file to be read.

String

{{ message.fileName }}

Parallel Execution

Occurs in parallel with loop execution.

Boolean

false

Convert Date Fields

If enabled, DATE/TIMESTAMP fields from the file are converted to string format (e.g. yyyy-MM-dd for DATE, ISO-8601 for TIMESTAMP). When default, dates remain as numeric values (days/millis since epoch).

Boolean

False

Date Field Paths (optional)

Manually indicates date fields when the schema does not declare a logical type DATE.

String

N/A

Decode Base64 Fields

If enabled, the connector recursively scans the output JSON nodes. Any string identified as a valid Base64 sequence is automatically decoded to UTF-8 and replaced in-place.

Boolean

False

Fail On Error

If the option is active, the execution of the pipeline with an error will be interrupted. Otherwise, the pipeline execution proceeds, but the result will show a false value for the “success” property.

Boolean

false

circle-info

A compressed Parquet file generates JSON content larger than the file itself when it is read. It is important that you checj whether the pipeline has enough memory to handle the data, as it will be stored in the pipeline's memory.

Usage examples

Reading Parquet file

  • File Name: file.parquet

  • Parallel: deactivated

Output:

If the lines have been processed correctly, their respective subpipelines return { "success": true } for each individual line.

Last updated

Was this helpful?