Links

Stream Excel

Learn more about the component and how to use it.
Stream Excel reads a local Excel file row by row in a JSON structure and triggers subpipelines to process each line. This resource is indicated for situations in which large files need to be processed.
Take a look at the configuration parameters of the component:
  • File Name: determines the name of the local file to be read.
  • Sheet Name: name of the Excel sheet to be read.
  • Sheet Index: Excel sheet index to be read.
  • Use Sheet Index Instead Of Name: if activated, the option allows the sheet index to be informed instead of the name.
  • Max Fractional Digits: determines the precise number of fractional digits in a numeric cell when the Excel file is read (standard = 10).
  • Read Specific Columns As String: indicates which columns the component must read in string format instead of its original format. Each desired column must be informed separated by a comma (eg.: A,B,X,AA).
  • Read All Columns As String: if selected, the option will make all the columns to be read as string.
  • Parallel Execution Of Each Iteration: if selected, the option will make all the file lines to be read in parallel.
  • Fail On Error: when activated, this parameter suspends the pipeline execution only if there’s a severe occurrence in the iteration structure, disabling its complete conclusion. The “Fail On Error” parameter activation doesn’t have any connection with the errors occurred in the components used for the construction of the subpipelines (onProcess and onException).
  • Advanced: when selected, the option requires the definition of advanced parameters.
  • Skip: number of lines to be skipped before the file reading.
  • Limit: maximum number of lines to be read.
Stream Excel makes batch processing. To better understand the concept, click here.
IMPORTANT: Stream Excel isn’t capable of reading files in .xls format, but only in .xlsx format.

Messages flow

Input

The component accepts any input message, being able to use it through Double Braces.

Output

The component returns a JSON with the total amount of executions, successful executions and executions with error.
  • without error
{
"total": 5,
"success": 5,
"failed": 0
}
  • with error
{
"total": 5,
"success": 3,
"failed": 2
}
  • total: total number of processed lines
  • success: total number of successfully processed lines
  • failed: total number of line whose process failed
IMPORTANT: to know if a line has been correctly processed, there must be the return { "success": true } for every processed line.
The component throws an exception if the file doesn't exist or can't be read. On contrary, a message is produced at the output with the occurred exception.
You may also find an error by uploading a .xlsx file to Google Drive and, in a pipeline, using the Google Drive component to download it and a Stream Excel component to read it.
When you do this, an unexpected behavior of Google Sheets modifies the .xlsx file. This causes Stream Excel to read every row in a sheet (including blank rows), instead of reading only the ones with content. This behavior is not related to the Digibee Integration Platform.
As a workaround, you can copy the contents of the sheet and paste it into a new sheet tab in the same .xlsx file. If you do this, don't copy blank rows, or the same error will occur.
The files manipulation inside a pipeline occurs in a protected way. All the files can be accessed with a temporary directory only, where each pipeline key gives access to its own files set.

Stream Excel in Action

See below how the component behaves in certain situations and how it must be configured in each case.

Read the Excel file and analyze the results

For this example, let's assume that we already have an Excel file in the pipeline flow that was downloaded through components such as Google Drive, OneDrive, or a similar one. This file is a sheet with the names of the 100 billionaires selected by Forbes.
The Stream Excel component is configured as follows:
  • Step Name: Stream Excel
  • File Name: file.xlsx
  • Sheet Name: Plan1
  • Max Fractional Digits: 5
  • Read Specific Columns As String: B,D,F
  • Column Identifier: A
For this example, all toggle switches are disabled.

Input

{
"fileName": "sheets.xlsx"
}

Output

{
"total": 102,
"success": 0,
"failed": 102
}

Log results

To see this log, we use the Messages tab in the pipeline. In the figure below, all the sheet lines have been read individually by the component, including the names of the columns.

Read the Excel file and analyze a sheet that does not exist in the file

For this example, consider the same sheet as in the example above. However, we will select a sheet that does not exist.
The Stream Excel component will give the following error message (Fail On Error is disabled):
{
"success": false,
"message": "Sheet 'InvalidSheetName' does not exist",
"exception": "com.monitorjbl.xlsx.exceptions.MissingSheetException"
}

Read the invalid Excel file

In this example, let's consider a non-existent file in the pipeline flow.
The Stream Excel component will return the following error message (Fail On Error is disabled):
{
"success": false,
"message": "File invalidsheets.xlsx does not exist.",
"exception": "com.digibee.pipelineengine.exception.PipelineEngineRuntimeException"
}