Avro File Writer

Learn more about the Avro File Writer connector and how to use it in the Digibee Integration Platform.

Avro File Writer is a Pipeline Engine v2 exclusive connector.

The Avro File Writer connector allows you to write Avro files based on Avro schemas.

Avro is a popular data serialization framework used within the Hadoop Big Data ecosystem, known for its schema evolution support and compactness. For more information, see the official website.

Parameters

Take a look at the configuration parameters of the connector. Parameters supported by Double Braces expressions are marked with (DB).

General tab

Advanced tab

Documentation tab

Note that performance differences can occur when writing compressed and uncompressed Avro files. Since compression requires greater memory and processing consumption, it’s important to validate the limits that the pipeline should support when applying it.

Usage examples

File from JSON object

Writing an Avro File based on a JSON object payload:

  • File Name: file.avro

  • Data: {{ message.data }}

  • Schema: {{ message.schema }}

  • File Exists Policy: Overwrite

Data example:

{
  "data": {
    "name": "Aquiles",
    "phoneNumbers": [
      "11 99999-9999",
      "11 93333-3333"
    ],
    "active": true,
    "address": "St. Example",
    "score": 71.3,
    "details": "Some details"
  }
}

Schema example:

{
  "schema": {
    "type": "record",
    "name": "Record",
    "fields": [
        {
            "name": "name",
            "type": "string"
        },
        {
            "name": "phoneNumbers",
            "type": {
                "type": "array",
                "items": "string"
            }
        },
        {
            "name": "active",
            "type": "boolean"
        },
        {
            "name": "address",
            "type": "string"
        },
        {
            "name": "score",
            "type": "double"
        },
        {
            "name": "details",
            "type": [
                "string",
                "null"
            ]
        }
    ]
  }
}

Output:

{
  "success": true,
  "fileName": "file.avro"
}

File from JSON array of objects

Writing an Avro File based on a JSON array of objects payload:

  • File Name: file.avro

  • Data: {{ message.data }}

  • Schema: {{ message.schema }}

  • File Exists Policy: Overwrite

Data example:

{
  "data": [ 
    {
      "name": "Aquiles",
      "phoneNumbers": [
        "11 99999-9999",
        "11 93333-3333"
      ],
      "active": true,
      "address": "St. Example",
      "score": 71.3,
      "details": "Some details"
    },
    {
      "name": "Vitor",
      "phoneNumbers": [
        "11 97777-7777"
      ],
      "active": false,
      "address": "St. Example 2",
      "score": 80.0,
      "details": null
    }
  ]
}

Schema example:

{
  "schema": {
    "type": "record",
    "name": "Record",
    "fields": [
        {
            "name": "name",
            "type": "string"
        },
        {
            "name": "phoneNumbers",
            "type": {
                "type": "array",
                "items": "string"
            }
        },
        {
            "name": "active",
            "type": "boolean"
        },
        {
            "name": "address",
            "type": "string"
        },
        {
            "name": "score",
            "type": "double"
        },
        {
            "name": "details",
            "type": [
                "string",
                "null"
            ]
        }
    ]
  }
}

Output:

{
  "success": true,
  "fileName": "file.avro"
}

Uncompressed Avro file

Writing an uncompressed Avro File:

  • File Name: file.avro

  • Data: {{ message.data }}

  • Schema: {{ message.schema }}

  • File Exists Policy: Overwrite

  • Compression Codec: Uncompressed

Output:

{
  "success": true,
  "fileName": "file.avro"
}

Compressed Avro file

Writing a compressed Avro File:

  • File Name: file.avro

  • Data: {{ message.data }}

  • Schema: {{ message.schema }}

  • File Exists Policy: Overwrite

  • Compression Codec: BZIP2

Output:

{
  "success": true,
  "fileName": "file.avro"
}

File Exists Policy as Fail

Writing an Avro File with the same name of an existent file in the pipeline file directory:

  • File Name: file.avro

  • Data: {{ message.data }}

  • Schema: {{ message.schema }}

  • File Exists Policy: Fail

Output:

{
  "success": false,
  "message": "Something went wrong while trying to execute the Avro Writer connector",
  "error": "com.digibee.pipelineengine.exception.PipelineEngineRuntimeException: Avro file file.avro already exists."
}

Writing file from another Avro file - Explicit schema

Writing an Avro File with the data to be written coming from other Avro files instead of from a JSON payload, using a Schema explicit configuration:

  • File Name: file.avro

  • Data From File: activated

  • Files:

    • File Name: {{ message.existingAvroFile }}

  • Schema: {{ message.schema }}

  • File Exists Policy: Overwrite

Output:

{
  "success": true,
  "fileName": "file.avro"
}

Writing file from another Avro file - Infer Schema

Writing an Avro File with the data to be written coming from other Avro files instead of from a JSON payload, inferring the schema from the file:

  • File Name: file.avro

  • Data From File: activated

  • Files:

    • File Name: {{ message.existingAvroFile }}

  • Infer Schema: activated

  • File Exists Policy: Overwrite

Output:

{
  "success": true,
  "fileName": "file.avro"
}

Last updated