# Stream XML File Reader

**Stream XML File Reader** reads a local XML file and identifies nodes according to the configured target node and context fields. For each node found, it generates an XML structure along with its context properties and triggers subpipelines to process each resulting message independently. It is used to efficiently read and process large XML files in parts, without loading the entire file into memory.

## Parameters

Configure the connector using the parameters below. Fields that support [Double Braces expressions](https://docs.digibee.com/documentation/connectors-and-triggers/double-braces) are marked in the **Supports DB** column.

<table data-full-width="true"><thead><tr><th>Parameter</th><th width="248">Description</th><th>Type</th><th>Supports DB</th><th>Default </th></tr></thead><tbody><tr><td><strong>Alias</strong></td><td>Name (alias) for this connector’s output, allowing you to <a href="../../double-braces/how-to-reference-data-using-double-braces">reference it later in the flow using Double Braces expressions</a>.</td><td>String</td><td>✅</td><td>stream-xml-f-reader-1</td></tr><tr><td><strong>File name</strong> </td><td>File name or full file path (<code>tmp/processed/file.txt</code>) of the local XML file.</td><td>String</td><td>✅</td><td>data.xml</td></tr><tr><td><strong>Charset</strong></td><td>Name of the character code for the file reading.</td><td>String</td><td>❌</td><td>UTF-8</td></tr><tr><td><strong>Node Path</strong> </td><td>Path of the desired node to stream from the XML file (<code>//root/level1/level2/desirednode</code>).</td><td>String</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Context Paths</strong></td><td>Define tag paths that represent fields adding context to the desired node (<code>//root/node1/code</code> or <code>//root/node2/description</code>).</td><td>String</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Ignore Paths</strong></td><td>Define paths that will be ignored and not returned into the desired node (<code>//root/node1/email,//root/node2/city</code>).</td><td>String</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Ignore Nested Child Nodes</strong></td><td>If enabled, nested child nodes (nodes that are not direct children of the target node) are ignored. In this case, only nodes at the same level as the target node are returned, while deeper nested nodes are excluded.</td><td>Boolean</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Element Identifier</strong></td><td>Attribute to be sent when an error occurs.</td><td>String</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Parallel Execution Of Each Iteration</strong></td><td>Occurs in parallel with loop execution.</td><td>Boolean</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Remove whitespaces</strong></td><td>If enabled, whitespaces at the beginning/end of all XML character values are removed.</td><td>Boolean</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Coalesce</strong></td><td>If enabled, XML character values are read as single strings.</td><td>Boolean</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Escape Special Characters</strong></td><td>If enabled, it automatically escapes reserved XML characters ( <code>&#x26;</code>, <code>&#x3C;</code>, <code>></code>) to prevent parsing errors during data transformation.</td><td>Boolean</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Tolerate Invalid XML</strong></td><td>If enabled, a fallback JSON object is returned instead of throwing an exception for invalid XML.</td><td>Boolean</td><td>❌</td><td>N/A</td></tr><tr><td><strong>Fail On Error</strong></td><td>If enabled, stops the pipeline execution if a critical error occurs during iteration. It does not apply to errors in connectors used within subpipelines (<code>onProcess</code> and <code>onException</code>).</td><td>Boolean</td><td>❌</td><td>N/A</td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr></tbody></table>

{% hint style="warning" %}
Use the **Remove Whitespaces** parameter carefully to avoid compromising data integrity, as it removes spaces at each processing step. To prevent this, combine it with **Coalesce**, which reads character values as a single block. Keep in mind, however, that **Coalesce** may increase resource usage when processing large data sets.
{% endhint %}

## Messages flow <a href="#h_2cf37c23dc" id="h_2cf37c23dc"></a>

### Input <a href="#h_273a73a6f0" id="h_273a73a6f0"></a>

No specific input message is required. However, a valid XML file must exist in the pipeline’s local directory, and the **File Name** and **Node Path** fields must be properly configured for processing.

### Output <a href="#h_e9b66e2893" id="h_e9b66e2893"></a>

```json
{
    "total": 0,
    "success": 0,
    "failed": 0
}
```

* **`total`:** Total number of processed lines.
* **`success`:** Number of successfully processed lines.
* **`failed`:** Number of lines whose process failed.

{% hint style="info" %}
When a line is successfully processed, its corresponding subpipeline returns `{ "success": true }`.
{% endhint %}

## File handling & Batch processing

* The connector throws an exception if the **File Name** does not exist or cannot be read.
* File handling within a pipeline is protected: all files are accessed through a temporary directory, and each pipeline key provides access only to its own set of files.
* **Stream XML File Reade**r performs **batch processing**, meaning it continuously processes data in smaller, controlled batches for better efficiency and resource management.

### Event handling: Unsupported events

The Stream XML File Reader uses an **event-based reading mechanism**, where each type of data in the XML file is treated as an event to be processed. However, some event types are not handled during streaming, meaning they are not covered by the connector: <br>

* PROCESSING INSTRUCTION
* START DOCUMENT
* END DOCUMENT
* SPACE
* ENTITY REFERENCE
* ENTITY DECLARATION
* DTD
* NOTATION DECLARATION

{% hint style="info" %}
These events are not required for typical XML data processing and their omission helps improve performance when handling large XML files.
{% endhint %}

## Stream XML File Reader in Action <a href="#h_bbd1c5a904" id="h_bbd1c5a904"></a>

The following scenarios are based on the following XML file:

* **File name:** `file.xml`
* **Content:**

```xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<list-info qty="4">products</list-info>
<products>
<product>
<price>20.75</price>
<product>Chair</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>399.99</price>
<product>TV</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>100</price>
<product>Couch</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>78.99</price>
<product>Table</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
</products>
</root>
```

### Scenario 1: Streaming the file informing the desired node <a href="#h_fde9ff01b0" id="h_fde9ff01b0"></a>

#### **Input**

* **File Name:** file.xml
* **Node Path:** //root/products/product

#### **Output**

```json
{
    "total": 4,
    "success": 4,
    "failed": 0
}
```

Each element identified by the desired node path will be processed independently:

* **First subflow:**

```json
{
    "node":"<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
```

* **Second subflow:**

```json
{
    "node":"<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
```

* **Third subflow:**

```json
{
    "node":"<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
```

* **Forth subflow:**

```json
{
    "node":"<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
```

### Scenario 2: Streaming the file informing the desired node and context fields <a href="#h_525528055c" id="h_525528055c"></a>

#### **Input**

* **File Name:** file.xml
* **Node Path:** //root/products/product
* **Context Paths:** //root/list-info

#### **Output**

```json
{
    "total": 4,
    "success": 4,
    "failed": 0
}
```

Each element identified by the desired node path will be processed independently:

* **First subflow:**

```json
{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
```

* **Second subflow:**

```json
{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
    "value": "products"
    }
}
},
    "node": "<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
```

* **Third subflow:**

```json
{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
```

* **Forth subflow:**

```json
{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
```

### **Scenario 3: Streaming the file informing the desired node, context fields and nodes to be ignored** <a href="#h_bc3121d833" id="h_bc3121d833"></a>

#### **Input**

* **File Name:** file.xml
* **Node Path:** //root/products/product
* **Context Paths:** //root/list-info
* **Ignore Paths:** //root/products/product/tags

#### **Output**

```json
{
    "total": 4,
    "success": 4,
    "failed": 0
}
```

Each element identified by the desired node path will be processed independently:

* **First subflow:**

```json
{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>20.75</price><product>Chair</product></product>"
}
```

* **Second subflow:**

```json
{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>399.99</price><product>TV</product></product>"
}
```

* **Third subflow:**

```json
{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>100</price><product>Couch</product></product>"
}

```

* **Forth subflow:**

```json
{
    "context": {
    "root": {
    "list-info": {
    "attributes": {
    "qty": "4"
    },
        "value": "products"
    }
}
},
    "node": "<product><price>78.99</price><product>Table</product></product>"
}
```

### Scenario 4: Streaming the file informing the desired node and ignoring nested child nodes <a href="#h_657cfa45c9" id="h_657cfa45c9"></a>

#### **Input**

* **File Name:** file.xml
* **Node Path:** //root/products/product
* **Ignore Nested Child Nodes:** active

#### **Output**

```json
{
    "total": 4,
    "success": 4,
    "failed": 0
}
```

Each element identified by the desired node path will be processed independently:

* **First subflow:**

```json
{
    "data": {
    "node": "<product><price>20.75</price><product>Chair</product><tags></tags></product>"
},
    "success": true
}
```

* **Second subflow:**

```json
{
    "node": "<product><price>399.99</price><product>TV</product><tags></tags></product>"
}
```

* **Third subflow:**

```json
{
    "node": "<product><price>100</price><product>Couch</product><tags></tags></product>"
}
```

* **Forth subflow:**

```json
{
    "node": "<product><price>78.99</price><product>Table</product><tags></tags></product>"
}
```
