> For the complete documentation index, see [llms.txt](https://docs.digibee.com/documentation/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.digibee.com/documentation/connectors-and-triggers/connectors/files/parquet-file-writer.md). # Parquet File Writer {% hint style="info" %} **Parquet File Writer** is a Pipeline Engine v2 exclusive connector. {% endhint %} The **Parquet File Writer** connector allows you to read Parquet files based on Avro files. Parquet is a columnar file format designed for efficient data storage and retrieval. Further information can be found [on the official website](https://parquet.apache.org/). ## **Parameters** Take a look at the configuration parameters of the connector. Parameters supported by [Double Braces expressions](/documentation/connectors-and-triggers/double-braces/overview.md) are marked with `(DB)`. ### **General tab**

Parameter	Description	Default value	Data type
Parquet File Name `(DB)`	The file name of the Parquet file to be written.	file.parquet	String
Avro File Name `(DB)`	The file name of the Avro file that contains the data to be written to the Parquet file. Only Avro files within schemas with the type `RECORD` as the root data type are accepted.	file.avro	String
File Exists Policy	Defines which behavior to follow when a file with the same name (Parquet File Name parameter) already exists in the current pipeline execution. You can select the following options: Overwrite (overwrite the existing file) or Fail ( interrupt execution with an error if the file already exists).	Overwrite	String
Fail On Error	If the option is active, the execution of the pipeline with an error will be interrupted. Otherwise, the pipeline execution proceeds, but the result will show a false value for the `"success"` property.	False	Boolean

### **Advanced tab**

Parameter	Description	Default value	Data type
Dictionary Encoding	Defines if dictionary encoding for columns must be enabled.	False	Boolean
Compression Codec	The compression codec to be used when compressing the Parquet file. Options: Uncompressed Snappy GZIP LZ4 LZ4 Raw	Uncompressed	String
Row Group Size	Defines the size of row groups in the Parquet file.	134217728	Integer
Page Size	Defines the size of pages in the Parquet file.	1048576	Integer

### **Documentation tab**

Parameter	Description	Default value	Data type
Documentation	Section for documenting any necessary information about the connector configuration and business rules.	N/A	String

## Important information The **Parquet File Writer** connector can only generate Parquet files based on Avro files. It's not possible to create them directly from a JSON payload. Despite this limitation, the Digibee Integration Platform provides a way to generate Avro files using the **Avro File Writer** connector, apart from the fact that the **Parquet File Writer** connector can handle Avro files generated from another source outside the platform. When writing a Parquet file using the connector, Avro files containing the data types `BINARY` and `FIXED` are treated as binary data. When reading the generated file with the **Parquet File Reader** connector, the data for these types is displayed in base64 format. You should also note that performance differences can occur when writing compressed and uncompressed Parquet files. Since compression requires more memory and processing, it's important to validate the limits supported by the pipeline when you apply it. ## **Usage examples** ### **Uncompressed Parquet file** Writing an uncompressed Parquet File based on an Avro file: * **Parquet File Name:** file.parquet * **Avro File Name:** file.avro * **File Exists Policy:** Overwrite * **Compression Codec:** Uncompressed **Example of Avro file content in JSON format:** ``` { "name": "Aquiles", "phoneNumbers": [ "11 99999-9999", "11 93333-3333" ], "active": true, "address": "St. Example", "score": 71.3, "details": "Some details" } ``` **Output:** ``` { "success": true, "fileName": "file.parquet" } ``` ### **Compressed Parquet file** Writing a compressed Parquet File based on an Avro file: * **Parquet File Name:** file.parquet * **Avro File Name:** file.avro * **File Exists Policy:** Overwrite * **Compression Codec:** Snappy **Example of Avro file content in JSON format:** ``` { "name": "Aquiles", "phoneNumbers": [ "11 99999-9999", "11 93333-3333" ], "active": true, "address": "St. Example", "score": 71.3, "details": "Some details" } ``` **Output:** ``` { "success": true, "fileName": "file.parquet" } ``` ### **File Exists Policy as Fail** Writing a Parquet File with the same name of an existent file in the pipeline file directory: * **Parquet File Name:** file.parquet * **Avro File Name:** file.avro * **File Exists Policy:** Fail **Output:** {% code overflow="wrap" %} ``` { "success": false, "message": "Something went wrong while trying to execute the Parquet Writer connector", "error": "com.digibee.pipelineengine.exception.PipelineEngineRuntimeException: Parquet file file.parquet already exists." } ``` {% endcode %} --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.digibee.com/documentation/connectors-and-triggers/connectors/files/parquet-file-writer.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.