Stream File Reader Pattern

Discover more about the Stream File Reader Pattern component and how to use it on the Digibee Integration Platform.

Stream File Reader Pattern reads a local text file in blocks of line according to the configured pattern and triggers subpipelines to process each message. This resource must be used for large files.

Parameters

Take a look at the configuration parameters of the component. Parameters supported by Double Braces expressions are marked with (DB).

Parameter

Description

Default value

Data type

File Name (DB)

Name or full file path (i.e. tmp/processed/file.txt) of the local file.

N/A

String

Tokenizer

XML, PAIR, and REGEX. By using the XML option, it's possible to inform the name of the XML tag for the component to send the block that has it. By using the PAIR option, it's possible to configure a start token and an end token for the component to return to the subflow all the lines between both tokens. By using the REGEX option, it's necessary to inform a regular expression for the component to return the block between the regular expressions.

XML

String

Token

Token to be used to search the pattern in the informed file.

N/A

String

End Token

End token. This parameter is available only when PAIR Tokenizer is selected.

N/A

String

Include Tokens

For the inclusion of start and end tokens. This parameter is available only when PAIR Tokenizer is selected.

False

Boolean

Group

Whole value that determines the grouping value returned by the component when finding a match with the defined pattern.

N/A

String

Element Identifier

Attribute to be sent in case of errors.

N/A

String

Parallel Execution Of Each Iteration

Occurs in parallel with the loop execution.

False

Boolean

Fail On Error

When activated, this parameter suspends the pipeline execution only if there’s a severe occurrence in the iteration structure, disabling its complete conclusion. The Fail On Error parameter activation doesn’t have any connection with the errors occurred in the components used for the construction of the subpipelines (onProcess and onException).

False

Boolean

Messages flow

Input

{
    "filename": "fileName"
}

File Name substitutes the local pattern file.

Output

{
    "total": 0,
    "success": 0,
    "failed": 0
}

total: total number of processed lines.
success: total number of successful processed lines.
failed: total number of lines of whose processing failed.

To know if a line has been correctly processed, each processed line must return { "success": true }.

The component throws an exception if the File Name doesn't exist or can't be read.

The files manipulation inside a pipeline occurs in a protected way. All the files can be accessed with a temporary directory only, where each pipeline key gives access to its own files set.

Stream File Reader Pattern makes batch processing, which means processing the data continuously and in a controlled manner in smaller batches.

Stream File Reader Pattern in Action

See below how the component behaves in a determined situation and what its respective configuration is.

Using XML Tokenizer and searching tags information that can be in multiple lines

Given that the following XML file must be read:

file.xml

<m:documents>
<m:hashes>
<m:hashe>4rt4</m:hashe>
<m:hashe>6565g</m:hashe>
</m:hashes>
<m:orders xmlns:m="urn:shop" xmlns:cat="urn:shop:catalog">
<m:order>
<id>1</id><date>2014-02-25</date>
</m:order>
<m:order>
<id>2</id><date>2014-02-25</date>
</m:order>
</m:documents>

Configuring the component to return just the XML block of the order tag:

File Name: file.xml
Tokenizer: XML
Token: order

The result will be 2 subflows containing the values that are inside the order tag:

First:

<m:order>
<id>1</id><date>2014-02-25</date>
</m:order>

Second:

<m:order>
<id>2</id><date>2014-02-25</date>
</m:order>

Using the PAIR Tokenizer to read a file where there's a start token and an end token for each block

file.txt

###
Log1: Log info
Log2: Log info
--###
###
Log1: Log info
--###
###
Log1: Log info
Log2: Log info
Log3: Log info
--###

File Name: file.txt
Tokenizer: PAIR
Token: ###
End Token: --###
Include Tokens: deactivated

The result will be 3 subflows containing the values that are inside the start (###) and end tokens (--###):

First:

Log1: Log info
Log2: Log info

Second:

Log1: Log info

Third:

Log1: Log info
Log2: Log info
Log3: Log info

Using REGEX Tokenizer to search all the lines among patterns

file.txt

ID-3591d344-d74f-446e-867a-210d17345b50
Some text
xpto
ID-033e8b36-6b1e-42e8-aeb1-dc8498ffa6cb
Other text
xxx

The following pattern must be searched:

ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b

File Name: file.txt
Tokenizer: REGEX
Token: ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b

The result will be 2 subflows containing the values that match with the informed REGEX pattern.

First:

Some text
xpto

Second:

Other text
xxx

Using the REGEX Tokenizer to search all the lines among patterns and grouping every 2 results

file.txt

ID-3591d344-d74f-446e-867a-210d17345b50
Some text
xpto
ID-033e8b36-6b1e-42e8-aeb1-dc8498ffa6cb
Other text
xxx

The following pattern must be searched:

ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b

File Name: file.txt
Tokenizer: REGEX
Token: ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b
Group: 2

The result will be 1 subflow containing the values that match the informed REGEX pattern.

Some text
xpto
ID-\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b
{12}\\b
Other text
xxx

When the REGEX Tokenizer is used to group, the pattern found as output is shown.

If the pattern informed in the file isn't found, then the return will be an execution of the whole file. Be careful when specifying the REGEX.

PreviousStream File Reader NextStream JSON File Reader

Was this helpful?