Parquet File Reader

Learn more about the Parquet File Reader connector and how to use it in the Digibee Integration Platform.

Parquet File Reader is a Pipeline Engine v2 exclusive connector.

The Parquet File Reader connector allows you to read Parquet files.

Parquet is a columnar file format designed for efficient data storage and retrieval. Further information can be found on the official website.

Parameters

Take a look at the configuration parameters of the connector. Parameters supported by Double Braces expressions are marked with (DB).

General tab

Documentation tab

Note that a compressed Parquet file generates JSON content that is larger than the file itself when it’s read. Therefore, it’s important to check whether the pipeline has enough memory to handle the data, as it will be stored in the pipeline's memory.

Usage examples

Reading file

Reading a Parquet file without checking the file size:

  • File Name: file.parquet

  • Check File Size: deactivated

Output:

{
  "data": [
    {
      "name": "Aquiles",
      "phoneNumbers": [
        "11 99999-9999",
        "11 93333-3333"
      ],
      "active": true,
      "address": "St. Example",
      "score": 71.3,
      "details": "Some details"
    }
  ],
  "fileName": "file.parquet",
  "total": 1
}

Reading file - Checking file size

Reading a Parquet file checking if its size is larger than the Maximum File Size:

  • File Name: file.parquet

  • Check File Size: activated

  • Maximum File Size: 5000000

Output:

{
  "data": [
    {
      "name": "Aquiles",
      "phoneNumbers": [
        "11 99999-9999",
        "11 93333-3333"
      ],
      "active": true,
      "address": "St. Example",
      "score": 71.3,
      "details": "Some details"
    }
  ],
  "fileName": "file.parquet",
  "total": 1
}

Last updated