Autoscalling
Learn the basics about this serverless technology on Digibee Integration Platform.
Last updated
Was this helpful?
Learn the basics about this serverless technology on Digibee Integration Platform.
Last updated
Was this helpful?
This documentation is exclusive to customers using the Consumption-Based Model and may not be applicable to your realm.
Autoscaling is the process of adjusting the number of computational resourcing according to the workload. Pipeline deployments are horizontally scalable, which means that the Autoscaling feature allows replicas to be dynamically deployed and deleted according to the number of messages (invocations) in the queue.
The Digibee Integration Platform continuously monitors the queue and dynamically adjusts the number of replicas to ensure optimal resource utilization and efficient processing, regardless of which trigger generated the messages.
Every pipeline execution starts with the Trigger. Different trigger types, such as Scheduler, Event or HTTP triggers, have unique behaviors and activation criteria. These triggers determine whether a message is sent to the execution queue for further processing or whether it should be skipped or rejected altogether.
The implemented algorithm scales pipelines horizontally. Every 10 seconds, the Digibee Integration Platform monitors the system and can execute any of the following commands:
Activation (from 0 to 1): If the minimum number of replicas is set to zero, the pipeline will be active to receive new messages but will have no replicas running. The Digibee Platform will deploy the first replica when the queue receives the first message. A new replica will then be deployed and assigned to process this message and subsequent ones.
Scale up (from 1 to N): If the number of messages in the execution queue exceeds 70% of the total concurrent executions supported by the running replicas, the platform will deploy a new set of replicas.
Scale down (from N to 1): The system calculates the number of replicas needed to handle incoming requests by dividing the total load (message rate * pipeline response time) by the total number of concurrent executions supported by all replicas. The scale up and down is performed using the formula below:
Formula: Replicas = ((Message rate per second * Response time in seconds) / Total Concurrent Executions)
Scale to Zero (from 1 to 0): If the minimum number of replicas is set to zero and there is only one running replica idle for more than 60 seconds, the pipeline will be scaled down to zero. It will be reactivated when the next message arrives in the queue.
The system executes an Activation Command to create and start the execution of the first replica. This command triggers the processing of the first message in the queue. As a result, the first container is created, and the components required to process the pipeline are initialized only when the replica is deployed, that is, when messages arrive. This command can generate more latency to the first message total response time. It is recommended to set min replicas >= 1 for pipelines that are latency sensitive.
The Platform processes the first set of messages (according to the concurrency settings) after the replica has been deployed and initialized. When new messages arrive in the queue, depending on the deployment configuration (maximum replicas), they can either be executed by the available consumers or wait for a new replica to be started (Scale Up Command).
Although the startup time of a new replica is similar for the Activation and Scale Up commands, the Scale Up has a smaller impact to the total response time as the platform scales up new replicas when the queue size reaches 70% of the number of concurrent executions.
If there are only a few messages in the queue, the Platform adjusts the number of running replicas (Scale down) until only one replica remains active. If there are no messages in the queue and the single running replica remains idle for more than 60 seconds, the pipeline will be scaled down to zero (Scale to Zero Command). The replicas will be redeployed only when the next set of messages arrives in the queue.
The pipeline deployment starts with no replicas running.
When the first message arrives in the queue, the first replica will be allocated. In other words, the first message will need to wait for the first replica to start before it can be processed. If the only running replica remains idle for more than 60 seconds, the pipeline will scale down to zero replicas, and a new replica will only be initialized when a new message arrives in the queue.
The pipeline deployment ensures that there is always at least 1 replica running.
When the number of messages in the queue reaches 70% of the total processing capacity (simultaneous executions × replicas), new replicas are allocated. For example, if a small pipeline has 1 replica running, its capacity is 10. When 7 or more messages are in the queue, additional replicas will be allocated.