Parquet File Reader

Learn more about the Parquet File Reader connector and how to use it in the Digibee Integration Platform.

The Parquet File Reader connector allows you to read Parquet files.

Parquet is a columnar file format designed for efficient data storage and retrieval. Further information can be found on the official website.

Parameters

Configure the connector using the parameters below. Fields that support Double Braces expressions are marked in the Supports DB column.

Parameter

Description

Type

Supports DB

Default

Alias

Name (alias) for this connector’s output, allowing you to reference it later in the flow using Double Braces expressions.

String

✅

parquet-file-reader-1

File Name

The file name of the Parquet file to be read.

String

✅

Check File Size

If the option is active, the specified Maximum File Size is checked. If the size is larger, an error is displayed.

Boolean

❌

False

Convert Date Fields

If enabled, DATE/TIMESTAMP fields from the file are converted to string format (e.g. yyyy-MM-dd for DATE, ISO-8601 for TIMESTAMP). When default, dates remain as numeric values (days/millis since epoch).

Boolean

❌

False

Date Field Paths (optional)

Manually indicates date fields when the schema does not declare a logical type DATE.

String

❌

N/A

Decode Base64 Fields

If enabled, the connector recursively scans the output JSON nodes. Any string identified as a valid Base64 sequence is automatically decoded to UTF-8 and replaced in-place.

Boolean

❌

Boolean

Maximum File Size

Specifies the maximum size allowed (in bytes) of the file to be read.

Integer

❌

N/A

Fail On Error

If the option is active, the execution of the pipeline with an error will be interrupted. Otherwise, the pipeline execution proceeds, but the result will show a false value for the "success" property.

Boolean

❌

False

Note that a compressed Parquet file generates JSON content that is larger than the file itself when it’s read. Therefore, it’s important to check whether the pipeline has enough memory to handle the data, as it will be stored in the pipeline's memory.

Usage examples

Reading file

Reading a Parquet file without checking the file size:

File Name: file.parquet
Check File Size: deactivated

Output:

{
  "data": [
    {
      "name": "Aquiles",
      "phoneNumbers": [
        "11 99999-9999",
        "11 93333-3333"
      ],
      "active": true,
      "address": "St. Example",
      "score": 71.3,
      "details": "Some details"
    }
  ],
  "fileName": "file.parquet",
  "total": 1
}

Reading file - Checking file size

Reading a Parquet file checking if its size is larger than the Maximum File Size:

File Name: file.parquet
Check File Size: activated
Maximum File Size: 5000000

Output:

{
  "data": [
    {
      "name": "Aquiles",
      "phoneNumbers": [
        "11 99999-9999",
        "11 93333-3333"
      ],
      "active": true,
      "address": "St. Example",
      "score": 71.3,
      "details": "Some details"
    }
  ],
  "fileName": "file.parquet",
  "total": 1
}

PreviousGZIP V1 (Deprecated)NextParquet File Writer

Last updated 1 month ago

Was this helpful?

hashtagParameters

hashtagUsage examples

hashtagReading file

hashtagReading file - Checking file size

Parameters

Usage examples

Reading file

Reading file - Checking file size