Pipeline Engine

Discover how to use the Pipeline Engine, the Digibee iPaaS engine that interprets integration pipelines built through the interface, and also executes them.

The Digibee Integration Platform uses an engine that not only interprets the pipelines built through the interface, but also executes them. This engine is called Pipeline Engine.

Take a look at the concepts and the operation architecture of Pipeline Engine.

Concepts

Have the high-level view of Pipeline Engine in relation to our other components of the Platform:

  • Pipeline Engine: component responsible for the execution of the flows built in our Platform

  • Trigger: component that receives invocations from different technologies and forwards them to Pipeline Engine

  • Queues mechanism: queues management central mechanism of our Platform

Operation Architecture

Each flow (pipeline) is converted into a Docker container, executed with the Kubernetes technology - base of the Digibee Integration Platform. See the main guarantees of this operation model:

  • isolation: each container is executed in the infrastructure in an exclusive way (the memory spaces and CPU consumption are exclusive for each pipeline)

  • safety: pipelines DON'T talk to each other, unless it happens through the interfaces provided by our Platform

  • specific scalability: it's possible to increase the number of pipelines "replicas" in a specific way, which means, increase or decrease that demands more or less performance

Pipeline Sizes

We use the serverless concept, that means, you don't need to worry about infrastructure details for the execution of your pipelines. That way, each pipeline must have its size defined during the deployment. The Platform allows the pipelines to be used in 3 different sizes:

  • SMALL, 10 consumers, 20% of 1 CPU, 64 MB memory

  • MEDIUM, 20 consumers, 40% of 1 CPU, 128 MB memory

  • LARGE, 40 consumers, 80% of 1 CPU, 256 MB memory

Consumers

In addition to each pipeline size, specify the maximum number of concurrent executions (or consumers) that the pipeline allows. Depending on the size, a maximum number of consumers can be configured for execution.

In this way, a pipeline with 10 consumers can process 10 messages in parallel, while a pipeline with 1 consumer can process only 1 message.

Resources (CPU and Memory)

Besides, the pipeline size also defines the performance and the amount of memory it has access to. The performance is defined by the cycles quantity of a CPU to which the pipeline has access to.

On the other hand, the memory is given by the addressable space to treat messages and consume information.

Replicas

Messages are sent to the queues mechanism of the Platform through triggers, which are the entry points for pipeline execution. There are triggers for different types of technologies, such as REST, HTTP, Scheduler, Email, etc. Once the messages arrive in the queues mechanism, they are available for the pipelines consumption.

During the pipeline deployment, you need to specify the number of replicas. A SMALL pipeline with 2 replicas has twice the processing and scalability performance, etc. The replicas not only provide more processing and scalability, but also guarantee higher availability - if one of the replicas fails, there are others to take over.

In general, replicas deliver horizontal scalability, while pipeline size delivers vertical scalability. For this reason, even if a pipeline of LARGE size is equivalent to 4 pipelines of SMALL size according to the logic of the infrastructure, it doesn't mean that they are equivalent for all workloads. In many situations, mainly the ones that involve the processing of big messages, only the use of "vertically" bigger pipelines deliver the expected result.

Timeouts and Expiration

Triggers can be configured with the following main time control types for the messages processing:

  • Timeout: indicates the maximum amount of time the trigger waits for the pipeline return.

  • Expiration: indicates the maximum amount of time a message can be at the queue until being captured by a Pipeline Engine.

The timeout configuration is possible for all triggers, but only some of them allow the expiration to be configured. That happens because the triggers characteristic can be synchronous or asynchronous.

The event trigger is executed synchronously and can keep messages in the queue for a long time until they are consumed. For this reason, it is useful to define the expiration time of the messages generated by this type of trigger.

REST Trigger, however, depends on the pipeline response to give a return. In this case, it is not useful to configure the expiration time. Even so, internally,the synchronous triggers estimate the expiration time non-configurable,for the messages. This ensures that the messages don't get lost in the queueing process.

Execution Control

Message processing is sequentially made for each consumer. Thus, if a pipeline is deployed with 1 consumer, the messages are processed 1 by 1 sequentially. While the message is being processed, it receives a mark and no other pipeline replica will be able to receive it for processing.

If a pipeline has any problem in the execution and it needs to be restarted (e.g.: OOM, crash, etc.), the messages in execution are returned to the queue mechanism.

Messages returned to the processing queue

Messages returned to the processing queue become available for the consumption of other pipeline replicas or even by the same replica that has an issue and had to be restarted.

In this case, it is possible to configure the pipeline to determine what to do when messages need to be reprocessed. All triggers in the Platform have a configuration called "Allow Redeliveries".

When this option is activated, the pipeline accepts the message to be reprocessed. When inactivated, the pipeline receives the message, detects that it is a reprocess and declines it with an error.

The message retry time is the Maximum Timeout time defined in the pipeline trigger; for a pipeline with Event trigger, it is the time defined in the Expiration.

It is also possible to detect whether the message under execution is being reprocessed or not. To do this, simply use the Double Braces language by accessing the metadata scope.

Example:

Last updated