Stream XML File Reader

The Documentation Portal provides guides on all options for components on the Digibee Integration Platform. This article covers Stream XML File Reader.

Stream XML File Reader performs the reading of a local XML file and, based on the configuration of a desired node and context fields, delivers a XML structure and context properties for each node found and triggers subpipelines to process each message. The component should be used for large files when parts of the whole need to be read efficiently.

Parameters

Take a look at the configuration parameters of the component. Parameters supported by Double Braces expressions are marked with (DB).

ParameterDescriptionDefault valueData type

File name (DB)

File name or full file path (i.e. tmp/processed/file.txt) of the local XML file.

data.xml

String

Charset

Name of the character code for the file reading.

UTF-8

String

Node Path

Path of the desired node to stream from the XML file (e.g.: //root/level1/level2/desirednode).

N/A

String

Context Paths

Define tag paths that represent fields adding context to the desired node (e.g.: //root/node1/code,//root/node2/description).

N/A

String

Ignore Paths

Define paths that will be ignored and not returned into the desired node (e.g.: //root/node1/email,//root/node2/city).

N/A

String

Ignore Nested Child Nodes

If active, nested child nodes (nodes not direct children of the desired node) will be ignored. In this case, the node at the same level as the desired node will be returned, but nodes below it will be ignored.

N/A

Boolean

Element Identifier

Attribute that will be sent in case of errors.

N/A

String

Parallel Execution Of Each Iteration

Occurs in parallel with loop execution.

N/A

Boolean

Fail On Error

When active, this parameter suspends pipeline execution in the case of a severe occurrence in the iteration structure, preventing its completion. The activation of the "Fail On Error" parameter is not related to errors occurring in components used to construct subpipelines (onProcess and onException).

N/A

Boolean

Remove whitespaces

If the option is active, whitespaces at the beginning/end of all XML character values are removed.

N/A

Boolean

Coalesce

If the option is active, XML character values are read as single strings.

N/A

Boolean

Be careful not to compromise the integrity of the data when activating the Remove whitespaces option. When streaming files within large character values, the component processes these values during many steps before consolidating them in a single value, and whitespaces removal is applied during each of these steps.

One way to safely use Remove whitespaces is to combine it with the Coalesce option. This ensures that the character values inside XML tags will be read at once, without breaking in several parts at first. However, keep in mind that when the Coalesce parameter is enabled, it may demand more pipeline resources when reading huge chunks of data at once.

Messages flow

Input

No specific input message is expected, but the existence of a XML file in the pipeline local directory and the filling of the File Name and Node Path fields for the file processing.

Output

{
"total": 0,
"success": 0,
"failed": 0
}
  • total: total number of processed lines.

  • success: total number of successfully processed lines.

  • failed: total number of lines whose process failed.

Important: when the lines are correctly processed, their respective subpipelines return { "success": true } for each of them.

The component throws an exception if File Name doesn’t exist or can’t be read.

The file manipulation inside a pipeline occurs in a protected way. All the files can be accessed with a temporary directory only, where each pipeline key gives access to its own files set.

Stream XML File Reader makes batch processing. To better understand the concept, read the article about Batch processing.

Stream XML File Reader in Action

The following scenarios are based on the following XML file:

  • File name: file.xml

  • Content:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<list-info qty="4">products</list-info>
<products>
<product>
<price>20.75</price>
<product>Chair</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>399.99</price>
<product>TV</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>100</price>
<product>Couch</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>78.99</price>
<product>Table</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
</products>
</root>

Streaming the file informing the desired node

Input

  • File Name: file.xml

  • Node Path: //root/products/product

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
"node":"<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Second subflow:

{
"node":"<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Third subflow:

{
"node":"<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Forth subflow:

{
"node":"<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

Streaming the file informing the desired node and context fields

Input

  • File Name: file.xml

  • Node Path: //root/products/product

  • Context Paths: //root/list-info

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Second subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Third subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Forth subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

Streaming the file informing the desired node, context fields and nodes to be ignored

Input

  • File Name: file.xml

  • Node Path: //root/products/product

  • Context Paths: //root/list-info

  • Ignore Paths: //root/products/product/tags

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>20.75</price><product>Chair</product></product>"
}
  • Second subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>399.99</price><product>TV</product></product>"
}
  • Third subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>100</price><product>Couch</product></product>"
}
  • Forth subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>78.99</price><product>Table</product></product>"
}

Streaming the file informing the desired node and ignoring nested child nodes

Input

  • File Name: file.xml

  • Node Path: //root/products/product

  • Ignore Nested Child Nodes: active

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
"data": {
"node": "<product><price>20.75</price><product>Chair</product><tags></tags></product>"
},
"success": true
}
  • Second subflow:

{
"node": "<product><price>399.99</price><product>TV</product><tags></tags></product>"
}
  • Third subflow:

{
"node": "<product><price>100</price><product>Couch</product><tags></tags></product>"
}
  • Forth subflow:

{
"node": "<product><price>78.99</price><product>Table</product><tags></tags></product>"
}

Additional information

Stream XML File Reader uses an event reading mechanism, through which each type of data present in the file is an event to be processed. With that, there are some types of events that are not covered during the file stream. These are they:

  • PROCESSING INSTRUCTION

  • START DOCUMENT

  • END DOCUMENT

  • SPACE

  • ENTITY REFERENCE

  • ENTITY DECLARATION

  • DTD

  • NOTATION DECLARATION

Last updated