Stream XML File Reader

Discover more about the Stream XML File Reader connector and how to use it on the Digibee Integration Platform.

Stream XML File Reader reads a local XML file and identifies nodes according to the configured target node and context fields. For each node found, it generates an XML structure along with its context properties and triggers subpipelines to process each resulting message independently. It is used to efficiently read and process large XML files in parts, without loading the entire file into memory.

Parameters

Configure the connector using the parameters below. Fields that support Double Braces expressions are marked in the Supports DB column.

Parameter
Description
Type
Supports DB
Default

Alias

Name (alias) for this connector’s output, allowing you to reference it later in the flow using Double Braces expressions.

String

stream-xml-f-reader-1

File name

File name or full file path (tmp/processed/file.txt) of the local XML file.

String

data.xml

Charset

Name of the character code for the file reading.

String

UTF-8

Node Path

Path of the desired node to stream from the XML file (//root/level1/level2/desirednode).

String

N/A

Context Paths

Define tag paths that represent fields adding context to the desired node (//root/node1/code or //root/node2/description).

String

N/A

Ignore Paths

Define paths that will be ignored and not returned into the desired node (//root/node1/email,//root/node2/city).

String

N/A

Ignore Nested Child Nodes

If enabled, nested child nodes (nodes that are not direct children of the target node) are ignored. In this case, only nodes at the same level as the target node are returned, while deeper nested nodes are excluded.

Boolean

N/A

Element Identifier

Attribute to be sent when an error occurs.

String

N/A

Parallel Execution Of Each Iteration

Occurs in parallel with loop execution.

Boolean

N/A

Remove whitespaces

If enabled, whitespaces at the beginning/end of all XML character values are removed.

Boolean

N/A

Coalesce

If enabled, XML character values are read as single strings.

Boolean

N/A

Escape Special Characters

If enabled, it automatically escapes reserved XML characters ( &, <, >) to prevent parsing errors during data transformation.

Boolean

N/A

Tolerate Invalid XML

If enabled, a fallback JSON object is returned instead of throwing an exception for invalid XML.

Boolean

N/A

Fail On Error

If enabled, stops the pipeline execution if a critical error occurs during iteration. It does not apply to errors in connectors used within subpipelines (onProcess and onException).

Boolean

N/A

Messages flow

Input

No specific input message is required. However, a valid XML file must exist in the pipeline’s local directory, and the File Name and Node Path fields must be properly configured for processing.

Output

{
    "total": 0,
    "success": 0,
    "failed": 0
}
  • total: Total number of processed lines.

  • success: Number of successfully processed lines.

  • failed: Number of lines whose process failed.

When a line is successfully processed, its corresponding subpipeline returns { "success": true }.

File handling & Batch processing

  • The connector throws an exception if the File Name does not exist or cannot be read.

  • File handling within a pipeline is protected: all files are accessed through a temporary directory, and each pipeline key provides access only to its own set of files.

  • Stream XML File Reader performs batch processing, meaning it continuously processes data in smaller, controlled batches for better efficiency and resource management.

Event handling: Unsupported events

The Stream XML File Reader uses an event-based reading mechanism, where each type of data in the XML file is treated as an event to be processed. However, some event types are not handled during streaming, meaning they are not covered by the connector:

  • PROCESSING INSTRUCTION

  • START DOCUMENT

  • END DOCUMENT

  • SPACE

  • ENTITY REFERENCE

  • ENTITY DECLARATION

  • DTD

  • NOTATION DECLARATION

These events are not required for typical XML data processing and their omission helps improve performance when handling large XML files.

Stream XML File Reader in Action

The following scenarios are based on the following XML file:

  • File name: file.xml

  • Content:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<list-info qty="4">products</list-info>
<products>
<product>
<price>20.75</price>
<product>Chair</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>399.99</price>
<product>TV</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>100</price>
<product>Couch</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>78.99</price>
<product>Table</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
</products>
</root>

Scenario 1: Streaming the file informing the desired node

Input

  • File Name: file.xml

  • Node Path: //root/products/product

Output

{
    "total": 4,
    "success": 4,
    "failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
    "node":"<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Second subflow:

{
    "node":"<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Third subflow:

{
    "node":"<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Forth subflow:

{
    "node":"<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

Scenario 2: Streaming the file informing the desired node and context fields

Input

  • File Name: file.xml

  • Node Path: //root/products/product

  • Context Paths: //root/list-info

Output

{
    "total": 4,
    "success": 4,
    "failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Second subflow:

{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
    "value": "products"
    }
}
},
    "node": "<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Third subflow:

{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Forth subflow:

{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

Scenario 3: Streaming the file informing the desired node, context fields and nodes to be ignored

Input

  • File Name: file.xml

  • Node Path: //root/products/product

  • Context Paths: //root/list-info

  • Ignore Paths: //root/products/product/tags

Output

{
    "total": 4,
    "success": 4,
    "failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>20.75</price><product>Chair</product></product>"
}
  • Second subflow:

{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>399.99</price><product>TV</product></product>"
}
  • Third subflow:

{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>100</price><product>Couch</product></product>"
}
  • Forth subflow:

{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>78.99</price><product>Table</product></product>"
}

Scenario 4: Streaming the file informing the desired node and ignoring nested child nodes

Input

  • File Name: file.xml

  • Node Path: //root/products/product

  • Ignore Nested Child Nodes: active

Output

{
    "total": 4,
    "success": 4,
    "failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
    "data": {
    "node": "<product><price>20.75</price><product>Chair</product><tags></tags></product>"
},
    "success": true
}
  • Second subflow:

{
    "node": "<product><price>399.99</price><product>TV</product><tags></tags></product>"
}
  • Third subflow:

{
    "node": "<product><price>100</price><product>Couch</product><tags></tags></product>"
}
  • Forth subflow:

{
    "node": "<product><price>78.99</price><product>Table</product><tags></tags></product>"
}

Last updated

Was this helpful?