Comment on page
Discover more about the Kafka Trigger and how to use it on the Digibee Integration Platform.
To use this trigger, it is necessary to get in touch with our Support Team to obtain the liberation.
Kafka Trigger is responsible for the consumption of messages from a Kafka broker.
- Avro format support is currently in Beta phase. Since it is not part of the standard to use Kafka to transmit big messages, we do not accept more than 5 MB of message dispatch per poll. We recommend you to use the (message.max.bytes) property in the broker for 1 MB maximum. Avro format data traffic capacity is also included in this size limitation.
- The Kafka Trigger key and payload settings must match the settings of the topics to be consumed by the Trigger: If Key As Avro is enabled, all keys of the records to be consumed must be in Avro format. If Payload As Avro is enabled, all payloads (values) of records to be consumed must be in Avro format.
This trigger has 2 configurable offsets commit strategies:
All messages received from the trigger are sent to the pipeline in a faster way, but with no delivery guarantee (that means, the pipeline return won't be waited for the message processing to be confirmed).
With autocommit activated, we use the commit default implemented by Kafka. The message dispatch can be configured by:
All messages received from consumer polling are sent together in an array. For example, if 10 messages are returned during this poll, the trigger will send an array of those 10 messages.
The dispatch to the pipeline will be made through the total array (only 1 message at a time).
For example, if during this poll 10 messages are returned, then the trigger sends only 1 message at a time. So a total of 10 messages dispatch will be made to the pipeline.
The trigger will be responsible for making the offsets commit, which will be made after receiving a message of success from the pipeline. Only the batch dispatch of the messages is possible, through which all the messages received by the consumer polling will be sent together in an array.
Example: if during this poll 10 messages are returned, then the trigger will send an array with these 10 messages.
There may be a redistribution of consumers and/or partitions of Kafka. If this happens between the pipeline response and the return to the trigger, the offsets will receive the commit. This may result in losses or duplicate messages.
In this option, the poll can bring a message array and its maximum size is defined by Max Poll Records. The messages go through commit only after the pipeline returns a successful transaction. If there's timeout during the pipeline deployment, the messages won't go through commit.
In this option, the poll will send 1 message only and not a message array. That way, the message's dispatch/receival throughput decreases, but the guarantee of a successful processing is greater - which means, there's no message's loss.
If the Topic gets rebalanced in the Kafka Broker during the messages processing and the consumers have to take on other partitions, the messages will go through commit if there's an error in the end of the pipeline deployment. That way, the messages won't be processed in the following poll.
To solve this issue, go for the Autocommit "false" and Batch Mode "false" configurations.
The consumers' configuration has direct impact on the messages input and output throughput when Kafka Trigger is activated. The ideal use scenario is to have the same configured consumers and partition quantity in a given topic.
If there are more consumers than partitions, the exceeding consumers will be idle until there's a partition increase. And, if this increase occurs, Kafka will start the consumer's balancing process.
It's the consumer group to which your pipeline will make the subscription in Kafka's topic. A topic can have "n" Consumer Groups and each of them will have "n" consumers that consume the topic's registers.
- Scenario 1
Let's say there's a topic named kafka-topic, a pipeline that uses a trigger configured by the consumer group (Consumer Group Name) named digibee and a second pipeline that uses a trigger configured with the same topic, but with a consumer group named digibee-2. In this case, both pipelines will receive the same messages.
- Scenario 2
Let's say there's a topic named kafka-topic, a pipeline that uses a trigger configured by the consumer group (Consumer Group Name) named digibee and a second pipeline that uses a trigger configured with the same topic and consumer group (digibee). Both pipelines will receive the messages given by this topic. However, Kafka is in charge of balancing the partitions between the consumers registered in the two triggers. In this case, both pipelines will receive messages in an intercalated way, according to the partitions distribution.
To use the authentication via Kerberos in Kafka Trigger is necessary to have registered the configuration file “krb5.conf” in the Realm parameter. If you haven't done it yet, get in touch with Support Team trough the chat service. After finishing this step, all you have to do is to correctly set a Kerberos-type account and use it in the component.
Pipelines associated with Kafka trigger receive the following message as input:
"data": <STRING message content>,
"topic": <STRING The topic from which the record is received>,
"offset": <LONG The position of the record in the corresponding Kafka partition>,
"partition": <INT The partition from which the record is received>,
"success": <BOOLEAN Indicates whether the individual message was successfully consumed or not>,
"header1": "value1", … (when included)
"success": <BOOLEAN Indicates whether all the messages were successfully consumed or not>
Last modified 3d ago