Digibee Documentation
Request documentationBook a demo
English
English
  • Quick start
  • Highlights
    • Release notes
      • Release notes 2025
        • May
        • April
        • March
        • February
        • January
      • Release notes 2024
        • December
        • November
        • October
        • September
        • August
          • Connectors release 08/20/2024
        • July
        • June
        • May
        • April
        • March
        • February
        • January
      • Release notes 2023
        • December
        • November
        • October
        • September
        • August
        • July
        • June
        • May
        • April
        • March
        • February
        • January
      • Release notes 2022
        • December
        • November
        • October
        • September
        • August
        • July
        • June
        • May
        • April
        • March
        • February
        • January
      • Release notes 2021
      • Release notes 2020
    • AI Pair Programmer
    • Digibeectl
      • Getting started
        • How to install Digibeectl on Windows
      • Digibeectl syntax
      • Digibeectl operations
  • Digibee in action
    • Use Cases in Action
      • Improving integration performance with API pagination
      • Automating file storage with Digibee
      • Reprocessing strategy in event-driven integrations
      • Key practices for securing sensitive information in pipelines with Digibee
      • OAuth2 for secure API access
      • Secure your APIs with JWT in Digibee
      • Integration best practices for developers on the Digibee Integration Platform
      • How to use Event-driven architecture on the Digibee Integration Platform
      • Dynamic file download with Digibee
      • Microservices: Circuit Breaker pattern for improving resilience
      • Error handling strategy in event-driven integrations
    • Troubleshooting
      • Integration guidance
        • How to resolve common pipeline issues
        • How to resolve Error 409: “You cannot update a pipeline that is not on draft mode”
        • How to resolve the "Pipeline execution was aborted" error
        • Integrated authentication with Microsoft Entra ID
        • How to resolve the "Failed to initialize pool: ONS configuration failed" error
        • How to perform IP address mapping with Progress database
        • How to build integration flows that send error notifications
        • How to send logs to external services
        • How JSONPath differs in connectors and the Execution panel
        • Using JSONPath to validate numbers with specific initial digits
        • How to analyze the "Network error: Failed to fetch" in the Execution panel
        • How to handle request payloads larger than 5MB
        • How to configure Microsoft Entra ID to display groups on the Digibee Integration Platform
        • How to build an HL7 message
      • Connectors behavior and configuration
        • Timeout in the Pipeline Executor connector
        • How to use DISTINCT and COUNT in the Object Store
        • Understanding @@DGB_TRUNCATED@@ on the Digibee Integration Platform
        • How to resolve names without a DNS - REST, SOAP, SAP (web protocols)
        • How to read and write files inside a folder
        • AuthToken Reuse for Salesforce connector
        • How to resolve the "Invalid payload" error in API Integration
        • Supported databases
          • Functions and uses for databases
      • Connectors implementation and usage examples
        • Google Storage: Usage scenarios
        • DB V2: Usage scenarios
        • For Each: Usage example
        • Template and its uses
        • Digibee JWT implementation
        • Email V1: Usage example (Deprecated)
      • JOLT applications
        • Transformer: Getting to know JOLT
        • Transformer: Transformations with JOLT
        • Transformer: Add values to list elements
        • Transformer: Operations overview
        • Transformer: Date formatting using split and concat
        • Transformer: Simple IF-ELSE logic with JOLT
      • Platform access and performance tips
        • How to solve login problems on the Digibee Integration Platform
        • How to receive updates from Digibee Status Page
        • How to clean the Digibee Integration Platform cache
      • Governance troubleshooting guidance
        • How to consume Internal API pipelines using ZTNA
        • How to use Internal API with and without a VPN
        • How to generate, convert, and register SSH Keys
        • mTLS authentication
          • How to configure mTLS on the Digibee Integration Platform
          • FAQs: Certificates in mTLS
        • How to connect Digibee to Oracle RAC
        • How to connect Digibee to SAP
        • How to connect Digibee to MongoDB Atlas using VPN
        • How to manage IPs on the Digibee Integration Platform
        • Configuring the Dropbox account
        • How to use your Gmail account with the Digibee email component (SMTP)
        • How to use the CORS policy on the Digibee Integration Platform
      • Deployment scenarios
        • Solving the “Out of memory” errors in deployment
        • Warning of route conflicts
    • Best practices
      • Best practices for building a pipeline
      • Best practices on validating messages in a consumer pipeline
      • Avoiding loops and maximizing pipeline efficiency
      • Naming: Global, Accounts, and API Keys
      • Pagination tutorial
        • Pagination tutorial - part 1
        • Pagination tutorial - part 2
        • Pagination tutorial - part 3
        • Pagination tutorial - part 4
      • Pagination example
      • Event-driven architecture
      • Notification model in event-driven integrations
      • OAuth2 integration model with Digibee
      • Best practices for error handling in pipelines
    • Digibee Academy
      • Integration Developer Bootcamp
  • Reference guides
    • Connectors
      • AWS
        • S3 Storage
        • SQS
        • AWS Secrets Manager
        • AWS Athena
        • AWS CloudWatch
        • AWS Elastic Container Service (ECS)
        • AWS Eventbridge
        • AWS Identity and Access Management (IAM)
        • AWS Kinesis
        • AWS Kinesis Firehose
        • AWS Key Management Service (KMS)
        • AWS Lambda
        • AWS MQ
        • AWS Simple Email Service (SES)
        • AWS Simple Notification System (SNS)
        • AWS Security Token Service (STS)
        • AWS Translate
      • Azure
        • Azure CosmosDB
        • Azure Event Hubs
        • Azure Key Vault
        • Azure ServiceBus
        • Azure Storage DataLake Service
        • Azure Storage Queue Service
      • Enterprise applications
        • SAP
        • Salesforce
        • Braintree
        • Facebook
        • GitHub
        • Jira
        • ServiceNow
        • Slack
        • Telegram
        • Twilio
        • WhatsApp
        • Wordpress
        • Workday
        • Zendesk
      • File storage
        • Blob Storage (Azure)
        • Digibee Storage
        • Dropbox
        • FTP
        • Google Drive
        • Google Storage
        • OneDrive
        • SFTP
        • WebDav V2
        • WebDav (Deprecated)
      • Files
        • Append Files
        • Avro File Reader
        • Avro File Writer
        • CSV to Excel
        • Excel
        • File Reader
        • File Writer
        • GZIP V2
        • GZIP V1 (Deprecated)
        • Parquet File Reader
        • Parquet File Writer
        • Stream Avro File Reader
        • Stream Excel
        • Stream File Reader
        • Stream File Reader Pattern
        • Stream JSON File Reader
        • Stream Parquet File Reader
        • Stream XML File Reader
        • XML Schema Validator
        • ZIP File
        • NFS
      • Flow
        • Delayer
      • Google/GCP
        • Google BigQuery
        • Google BigQuery Standard SQL
        • Google Calendar
        • Google Cloud Functions
        • Google Mail
        • Google PubSub
        • Google Secret Manager
        • Google Sheets
      • Industry solutions
        • FHIR (Beta)
        • Gupy Public API
        • HL7
        • HubSpot: Sales and CMS
        • Mailgun API
        • Oracle NetSuite (Beta)
        • Orderful
        • Protheus: Billing and Inventory of Cost
      • Logic
        • Block Execution
        • Choice
        • Do While
        • For Each
        • Retry
        • Parallel Execution
      • Queues and messaging
        • Event Publisher
        • JMS
        • Kafka
        • RabbitMQ
      • Security
        • AES Cryptography
        • Asymmetric Cryptography
        • CMS
        • Digital Signature
        • JWT (Deprecated)
        • JWT V2
        • Google IAP Token
        • Hash
        • Digibee JWT (Generate and Decode)
        • LDAP
        • PBE Cryptography
        • PGP
        • RSA Cryptography
        • Symmetric Cryptography
      • Structured data
        • CassandraDB
        • DB V2
        • DB V1 (Deprecated)
        • DynamoDB
        • Google Big Table
        • Memcached
        • MongoDB
        • Object Store
        • Relationship
        • Session Management
        • Stored Procedure
        • Stream DB V3
        • Stream DB V1 (Deprecated)
        • ArangoDb
        • Caffeine Cache
        • Caffeine LoadCache
        • Couchbase
        • CouchDB
        • Ehcache
        • InfluxDB
      • Tools
        • Assert V2
        • Assert V1 (Deprecated)
        • Base64
        • CSV to JSON V2
        • CSV to JSON V1 (Deprecated)
        • HL7 Message Transformer (Beta)
        • HTML to PDF
        • Transformer (JOLT) V2
        • JSLT
        • JSON String to JSON Transformer
        • JSON to JSON String Transformer
        • JSON to XML Transformer
        • JSON to CSV V2
        • JSON to CSV Transformer (Deprecated)
        • JSON Path Transformer V2
        • JSON Path Transformer
        • JSON Transformer
        • Log
        • Pipeline Executor
        • QuickFix (Beta)
        • SSH Remote Command
        • Script (JavaScript)
        • Secure PDF
        • Store Account
        • Template Transformer
        • Throw Error
        • Transformer (JOLT)
        • Validator V1 (Deprecated)
        • Validator V2
        • XML to JSON Transformer
        • XML Transformer
        • JSON Generator (Mock)
      • Web protocols
        • Email V2
        • Email V1 (Deprecated)
        • REST V2
        • REST V1 (Deprecated)
        • SOAP V1 (Deprecated)
        • SOAP V2
        • SOAP V3
        • WGet (Download HTTP)
        • gRPC
    • Triggers
      • Web Protocols
        • API Trigger
        • Email Trigger
        • Email Trigger V2
        • HTTP Trigger
        • HTTP File Trigger
          • HTTP File Trigger - Downloads
          • HTTP File Trigger - Uploads
        • REST Trigger
      • Scheduling
        • Scheduler Trigger
      • Messaging and Events
        • Event Trigger
        • JMS Trigger
        • Kafka Trigger
        • RabbitMQ Trigger
      • Others
        • DynamoDB Streams Trigger
        • HL7 Trigger
        • Salesforce Trigger - Events
    • Double Braces
      • How to reference data using Double Braces
      • Double Braces functions
        • Math functions
        • Utilities functions
        • Numerical functions
        • String functions
        • JSON functions
        • Date functions
        • Comparison functions
        • File functions
        • Conditional functions
      • Double Braces autocomplete
  • Development cycle
    • Build
      • Canvas
        • AI Assistant
        • Smart Connector User Experience
        • Execution panel
        • Design and Inspect Mode
        • Linter: Canvas building validation
        • Connector Mocking
      • Pipeline
        • How to create a pipeline
        • How to scaffold a pipeline using an OpenAPI specification
        • How to create a project
        • Pipeline version history
        • Pipeline versioning
        • Messages processing
        • Subpipelines
      • Capsules
        • How to use Capsules
          • How to create a Capsule collection
            • Capsule header dimensions
          • How to create a Capsule group
          • How to configure a Capsule
          • How to build a Capsule
          • How to test a Capsule
          • How to save a Capsule
          • How to publish a Capsule
          • How to change a Capsule collection or group
          • How to archive and restore a Capsule
        • Capsules versioning
        • Public capsules
          • SAP
          • Digibee Tools
          • Google Sheets
          • Gupy
          • Send notifications via email
          • Totvs Live
          • Canvas LMS
        • AI Assistant for Capsules Docs Generation
    • Run
      • Run concepts
        • Autoscalling
      • Deployment
        • Deploying a pipeline
        • How to redeploy a pipeline
        • How to promote pipelines across environments
        • How to check the pipeline deployment History
        • How to rollback to a previous deployment version
        • Using deployment history advanced functions
        • Pipeline deployment status
      • How warnings work on pipelines in Run
    • Monitor
      • Monitor Insights (Beta)
      • Completed executions
        • Pipeline execution logs download
      • Pipeline logs
      • Pipeline Metrics
        • Pipeline Metrics API
          • How to set up Digibee API metrics with Datadog
          • How to set up Digibee API metrics with Prometheus
        • Connector Latency
      • Alerts
        • How to create an alert
        • How to edit an alert
        • How to activate, deactivate or duplicate an alert
        • How to delete an alert
        • How to configure alerts on Slack
        • How to configure alerts on Telegram
        • How to configure alerts through a webhook
        • Available metrics
        • Best practices about alerts
        • Use cases for alerts
      • VPN connections monitoring
        • Alerts for VPN metrics
  • Connectivity management
    • Connectivity
    • Zero Trust Network Access (ZTNA)
      • Prerequisites for using ZTNA
      • How to view connections (Edge Routers)
      • How to view the Network Mappings associated with an Edge Router
      • How to add new ZTNA connections (Edge Routers)
      • How to delete connections (Edge Routers)
      • How to view routes (Network Mapping)
      • How to add new routes (Network Mapping)
      • How to add routes in batch for ZTNA
      • How to edit routes (Network Mapping)
      • How to delete routes (Network Mapping)
      • How to generate new keys (Edge Router)
      • How to change the environment of Edge routers
      • ZTNA Inverse Flow
      • ZTNA Groups
    • Virtual Private Network (VPN)
  • Platform administration
    • Administration
      • Audit
      • Access control
        • Users
        • Groups
        • Roles
          • List of permissions by service
          • Roles and responsibilities: Governance and key stakeholder identification
      • Identity provider integration
        • How to integrate an identity provider
        • Authentication rules
        • Integration of IdP groups with Digibee groups
          • How to create a group integration
          • How to test a group integration
          • How to enable group integrations
          • How to edit a group integration
          • How to delete a group integration
      • User authentication and authorization
        • How to activate and deactivate two-factor authentication
        • Login flow
      • Organization groups
    • Settings
      • Globals
        • How to create Globals
        • How to edit or delete Globals
        • How to use Globals
      • Accounts
        • Configuring each account type
        • Monitor changes to account settings in deployed pipelines
        • OAuth2 Architecture
          • Registration of new OAuth providers
      • Consumers (API Keys)
      • Relationship model
      • Multi-Instance
        • Deploying a multi-instance pipeline
      • Log Streaming
        • How to use Log Streaming with Datadog
    • Governance
      • Policies
        • Security
          • Internal API access policy
          • External API access policy
          • Sensitive fields policy
        • Transformation
          • Custom HTTP header
          • CORS HTTP header
        • Limit of Replicas policy
    • Licensing
      • Licensing models
        • Consumption Based model
      • Capacity and quotas
      • License consumption
    • Digibee APIs
      • How to create API credentials
  • Digibee concepts
    • Pipeline Engine
      • Digibee Integration Platform Pipeline Engine v2
      • Support Dynamic Accounts (Restricted Beta)
    • Digibee Integration Platform Dedicated SaaS
      • Digibee Integration Platform architecture on Dedicated Saas model
      • Requirements for Digibee Dedicated Saas model
      • Site-to-Site VPN for dedicated SaaS customer support
      • Dedicated Saas customer responsibilities
      • Custom Images of Kubernetes Nodes
      • Digibee Dedicated SaaS installation on AWS
        • How to install requirements before installing Digibee Integration Platform on EKS
        • Permissions to use Digibee Integration Platform on EKS
        • How to create custom nodes for EKS (Golden Images)
    • Introduction to ZTNA
  • Help & FAQ
    • Digibee Customer Support
    • Request documentation, suggest features, or send feedback
    • Beta Program
    • Security and compliance
    • About Digibee
Powered by GitBook
On this page
  • Parameters
  • Recommended use for Remove whitespaces and Coalesce
  • Messages flow
  • Input
  • Output
  • Stream XML File Reader in Action
  • Streaming the file informing the desired node
  • Streaming the file informing the desired node and context fields
  • Streaming the file informing the desired node, context fields and nodes to be ignored
  • Streaming the file informing the desired node and ignoring nested child nodes
  • Additional information

Was this helpful?

  1. Reference guides
  2. Connectors
  3. Files

Stream XML File Reader

The Documentation Portal provides guides on all options for components on the Digibee Integration Platform. This article covers Stream XML File Reader.

PreviousStream Parquet File ReaderNextXML Schema Validator

Last updated 3 months ago

Was this helpful?

Stream XML File Reader performs the reading of a local XML file and, based on the configuration of a desired node and context fields, delivers a XML structure and context properties for each node found and triggers subpipelines to process each message. The component should be used for large files when parts of the whole need to be read efficiently.

Parameters

Take a look at the configuration parameters of the component. Parameters supported by are marked with (DB).

Parameter
Description
Default value
Data type

File name (DB)

File name or full file path (i.e. tmp/processed/file.txt) of the local XML file.

data.xml

String

Charset

Name of the character code for the file reading.

UTF-8

String

Node Path

Path of the desired node to stream from the XML file (e.g.: //root/level1/level2/desirednode).

N/A

String

Context Paths

Define tag paths that represent fields adding context to the desired node (e.g.: //root/node1/code,//root/node2/description).

N/A

String

Ignore Paths

Define paths that will be ignored and not returned into the desired node (e.g.: //root/node1/email,//root/node2/city).

N/A

String

Ignore Nested Child Nodes

If active, nested child nodes (nodes not direct children of the desired node) will be ignored. In this case, the node at the same level as the desired node will be returned, but nodes below it will be ignored.

N/A

Boolean

Element Identifier

Attribute that will be sent in case of errors.

N/A

String

Parallel Execution Of Each Iteration

Occurs in parallel with loop execution.

N/A

Boolean

Fail On Error

When active, this parameter suspends pipeline execution in the case of a severe occurrence in the iteration structure, preventing its completion. The activation of the "Fail On Error" parameter is not related to errors occurring in components used to construct subpipelines (onProcess and onException).

N/A

Boolean

Remove whitespaces

If the option is active, whitespaces at the beginning/end of all XML character values are removed.

N/A

Boolean

Coalesce

If the option is active, XML character values are read as single strings.

N/A

Boolean

Recommended use for Remove whitespaces and Coalesce

Be careful not to compromise the integrity of the data when activating the Remove whitespaces option. When streaming files within large character values, the component processes these values during many steps before consolidating them in a single value, and whitespaces removal is applied during each of these steps.

One way to safely use Remove whitespaces is to combine it with the Coalesce option. This ensures that the character values inside XML tags will be read at once, without breaking in several parts at first. However, keep in mind that when the Coalesce parameter is enabled, it may demand more pipeline resources when reading huge chunks of data at once.

Messages flow

Input

No specific input message is expected, but the existence of a XML file in the pipeline local directory and the filling of the File Name and Node Path fields for the file processing.

Output

{
"total": 0,
"success": 0,
"failed": 0
}
  • total: total number of processed lines.

  • success: total number of successfully processed lines.

  • failed: total number of lines whose process failed.

Important: when the lines are correctly processed, their respective subpipelines return { "success": true } for each of them.

The component throws an exception if File Name doesn’t exist or can’t be read.

The file manipulation inside a pipeline occurs in a protected way. All the files can be accessed with a temporary directory only, where each pipeline key gives access to its own files set.

Stream XML File Reader makes batch processing, which means processing the data continuously and in a controlled manner in smaller batches.

Stream XML File Reader in Action

The following scenarios are based on the following XML file:

  • File name: file.xml

  • Content:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<list-info qty="4">products</list-info>
<products>
<product>
<price>20.75</price>
<product>Chair</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>399.99</price>
<product>TV</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>100</price>
<product>Couch</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
<product>
<price>78.99</price>
<product>Table</product>
<tags>
<element>NEW</element>
<element>FURNITURE</element>
</tags>
</product>
</products>
</root>

Streaming the file informing the desired node

Input

  • File Name: file.xml

  • Node Path: //root/products/product

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
"node":"<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Second subflow:

{
"node":"<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Third subflow:

{
"node":"<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Forth subflow:

{
"node":"<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

Streaming the file informing the desired node and context fields

Input

  • File Name: file.xml

  • Node Path: //root/products/product

  • Context Paths: //root/list-info

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>20.75</price><product>Chair</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Second subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>399.99</price><product>TV</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Third subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>100</price><product>Couch</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}
  • Forth subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>78.99</price><product>Table</product><tags><element>NEW</element><element>FURNITURE</element></tags></product>"
}

Streaming the file informing the desired node, context fields and nodes to be ignored

Input

  • File Name: file.xml

  • Node Path: //root/products/product

  • Context Paths: //root/list-info

  • Ignore Paths: //root/products/product/tags

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>20.75</price><product>Chair</product></product>"
}
  • Second subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>399.99</price><product>TV</product></product>"
}
  • Third subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>100</price><product>Couch</product></product>"
}
  • Forth subflow:

{
"context": {
"root": {
"list-info": {
"attributes": {
"qty": "4"
},
"value": "products"
}
}
},
"node": "<product><price>78.99</price><product>Table</product></product>"
}

Streaming the file informing the desired node and ignoring nested child nodes

Input

  • File Name: file.xml

  • Node Path: //root/products/product

  • Ignore Nested Child Nodes: active

Output

{
"total": 4,
"success": 4,
"failed": 0
}

Each element identified by the desired node path will be processed independently:

  • First subflow:

{
"data": {
"node": "<product><price>20.75</price><product>Chair</product><tags></tags></product>"
},
"success": true
}
  • Second subflow:

{
"node": "<product><price>399.99</price><product>TV</product><tags></tags></product>"
}
  • Third subflow:

{
"node": "<product><price>100</price><product>Couch</product><tags></tags></product>"
}
  • Forth subflow:

{
"node": "<product><price>78.99</price><product>Table</product><tags></tags></product>"
}

Additional information

Stream XML File Reader uses an event reading mechanism, through which each type of data present in the file is an event to be processed. With that, there are some types of events that are not covered during the file stream. These are they:

  • PROCESSING INSTRUCTION

  • START DOCUMENT

  • END DOCUMENT

  • SPACE

  • ENTITY REFERENCE

  • ENTITY DECLARATION

  • DTD

  • NOTATION DECLARATION

Double Braces expressions