Parquet File Writer

Learn more about the Parquet File Writer connector and how to use it in the Digibee Integration Platform.

Parquet File Writer is a Pipeline Engine v2 exclusive connector.

The Parquet File Writer connector allows you to read Parquet files based on Avro files.

Parquet is a columnar file format designed for efficient data storage and retrieval. Further information can be found on the official website.

Parameters

Take a look at the configuration parameters of the connector. Parameters supported by Double Braces expressions are marked with (DB).

General tab

Advanced tab

Documentation tab

Important information

The Parquet File Writer connector can only generate Parquet files based on Avro files. It's not possible to create them directly from a JSON payload.

Despite this limitation, the Digibee Integration Platform provides a way to generate Avro files using the Avro File Writer connector, apart from the fact that the Parquet File Writer connector can handle Avro files generated from another source outside the platform.

When writing a Parquet file using the connector, Avro files containing the data types BINARY and FIXED are treated as binary data. When reading the generated file with the Parquet File Reader connector, the data for these types is displayed in base64 format.

You should also note that performance differences can occur when writing compressed and uncompressed Parquet files. Since compression requires more memory and processing, it's important to validate the limits supported by the pipeline when you apply it.

Usage examples

Uncompressed Parquet file

Writing an uncompressed Parquet File based on an Avro file:

  • Parquet File Name: file.parquet

  • Avro File Name: file.avro

  • File Exists Policy: Overwrite

  • Compression Codec: Uncompressed

Example of Avro file content in JSON format:

{
  "name": "Aquiles",
  "phoneNumbers": [
    "11 99999-9999",
    "11 93333-3333"
  ],
  "active": true,
  "address": "St. Example",
  "score": 71.3,
  "details": "Some details"
}

Output:

{
  "success": true,
  "fileName": "file.parquet"
}

Compressed Parquet file

Writing a compressed Parquet File based on an Avro file:

  • Parquet File Name: file.parquet

  • Avro File Name: file.avro

  • File Exists Policy: Overwrite

  • Compression Codec: Snappy

Example of Avro file content in JSON format:

{
  "name": "Aquiles",
  "phoneNumbers": [
    "11 99999-9999",
    "11 93333-3333"
  ],
  "active": true,
  "address": "St. Example",
  "score": 71.3,
  "details": "Some details"
}

Output:

{
  "success": true,
  "fileName": "file.parquet"
}

File Exists Policy as Fail

Writing a Parquet File with the same name of an existent file in the pipeline file directory:

  • Parquet File Name: file.parquet

  • Avro File Name: file.avro

  • File Exists Policy: Fail

Output:

{
  "success": false,
  "message": "Something went wrong while trying to execute the Parquet Writer connector",
  "error": "com.digibee.pipelineengine.exception.PipelineEngineRuntimeException: Parquet file file.parquet already exists."
}

Last updated