Stream Excel

The Documentation Portal provides guides on all options for streaming on the Digibee iPaaS. This page covers how users can stream Excel components.

Stream Excel reads a local Excel file row by row in a JSON structure and triggers subpipelines to process each line. This resource is indicated for situations in which large files need to be processed.

Parameters

Take a look at the configuration options for the component. Parameters supported by Double Braces expressions are marked with (DB).

Parameter
Description
Default value
Data type

File Name (DB)

Determines the name or full file path (i.e., tmp/processed/file.txt) of the local file to be read.

file.xlsx

String

Sheet Name

Name of the Excel sheet to be read.

Plan1

String

Sheet Index

Excel sheet index to be read.

N/A

Integer

Use Sheet Index Instead Of Name

If activated, the option allows the sheet index to be informed instead of the name.

False

Boolean

Max Fractional Digits

Determines the precise number of fractional digits in a numeric cell when the Excel file is read.

5

Integer

Read Specific Columns As String

Indicates which columns the component must read in string format instead of its original format.

B,D,F

String

Read All Columns As String

If selected, the option will make all the columns be read as a string.

False

Boolean

Column Identifier

In case of errors, this is the column that will be sent to the onException sub-process.

A

String

Parallel Execution Of Each Iteration

If selected, the option will make all the file lines be read in parallel.

False

Boolean

Fail On Error

When activated, this parameter suspends the pipeline execution only if there’s a severe occurrence in the iteration structure, disabling its complete conclusion. The Fail On Error parameter activation doesn’t have any connection with the errors occurred in the components used for the construction of the subpipelines (onProcess and onException).

False

Boolean

Advanced

When selected, the option requires the definition of advanced parameters.

False

Boolean

Skip

Number of lines to be skipped before the file reading.

N/A

Integer

Limit

Maximum number of lines to be read.

N/A

Integer

Stream Excel makes batch processing. To better understand the concept, read the Batch processing documentation.

Stream Excel isn’t capable of reading files in .xls format, but only in .xlsx format.

Messages flow

Input

The component accepts any input message, being able to use it through Double Braces.

Output

The component returns a JSON with the total amount of executions, successful executions and executions with error.

  • without error

{
"total": 5,
"success": 5,
"failed": 0
}
  • with error

{
"total": 5,
"success": 3,
"failed": 2
}
  • total: total number of processed lines.

  • success: total number of successfully processed lines.

  • failed: total number of line whose process failed.

To know if a line has been correctly processed, there must be the return { "success": true } for every processed line.

The component throws an exception if the file doesn't exist or can't be read. On contrary, a message is produced at the output with the occurred exception.

You may also find an error by uploading a .xlsx file to Google Drive and, in a pipeline, using the Google Drive component to download it and a Stream Excel component to read it.

When you do this, an unexpected behavior of Google Sheets modifies the .xlsx file. This causes Stream Excel to read every row in a sheet (including blank rows), instead of reading only the ones with content. This behavior is not related to the Digibee Integration Platform.

As a workaround, you can copy the contents of the sheet and paste it into a new sheet tab in the same .xlsx file. If you do this, don't copy blank rows, or the same error will occur.

The files manipulation inside a pipeline occurs in a protected way. All the files can be accessed with a temporary directory only, where each pipeline key gives access to its own files set.

Stream Excel in action

See below how the component behaves in certain situations and how it must be configured in each case.

Read the Excel file and analyze the results

For this example, let's assume that we already have an Excel file in the pipeline flow that was downloaded through components such as Google Drive, OneDrive, or a similar one. This file is a sheet with the names of the 100 billionaires selected by Forbes.

The Stream Excel component is configured as follows:

  • File Name: file.xlsx

  • Sheet Name: Plan1

  • Use Sheet Index Instead of Name: deactivated

  • Max Fractional Digits: 5

  • Read Specific Columns As String: B,D,F

  • Read All Columns As String: deactivated

  • Column Identifier: A

  • Parallel Execution of Each Iteration: deactivated

  • Fail On Error: deactivated

  • Advanced: deactivated

Input

{
"fileName": "sheets.xlsx"
}

Output

{
"total": 102,
"success": 0,
"failed": 102
}

Log results

To see this log, we use the Messages tab in the pipeline. In the figure below, all the sheet lines have been read individually by the component, including the names of the columns.

Read the Excel file and analyze a sheet that does not exist in the file

For this example, consider the same sheet as in the example above. However, we will select a sheet that does not exist.

The Stream Excel component will give the following error message (Fail On Error is deactivated):

{
"success": false,
"message": "Sheet 'InvalidSheetName' does not exist",
"exception": "com.monitorjbl.xlsx.exceptions.MissingSheetException"
}

Read the invalid Excel file

In this example, let's consider a non-existent file in the pipeline flow.

The Stream Excel component will return the following error message (Fail On Error is deactivated):

{
"success": false,
"message": "File invalidsheets.xlsx does not exist.",
"exception": "com.digibee.pipelineengine.exception.PipelineEngineRuntimeException"
}

Last updated