Blog

Azure Service Bus: Tips to Optimize Functions

Category
Software development
Azure Service Bus: Tips to Optimize Functions

In today’s world, where distributed systems are more prevalent, asynchronous communication patterns such as publisher/subscriber are becoming widely used. With the rise of cloud computing, Azure Functions have emerged as a powerful tool for serverless computing. One of the common triggers for Azure Functions is the Azure Service Bus, a message broker that enables communication between different applications and services.

In this article, we will explore best practices to tune configuration settings in Azure Service Bus-triggered Azure Functions.

We will cover topics such as:

  • adjusting message lock duration, 
  • setting maximum concurrent calls, 
  • auto-completing messages, 
  • topic filters and custom message headers. 

Service Bus Project Setup 

First, we need to create a service bus topic and its subscription where it will send messages. We need a receiver of those messages, so we will create an Azure function to serve that purpose.

Since we’ll talk about tuning configuration settings, first we’ll do just a setup overview to get it going. 

Creating Azure Service Bus Topic and Subscription

We can go directly to the Azure portal to create a new topic and its subscription. When creating a new topic, besides naming, we will leave all the default settings as they are. For the subscription creation part, let’s take a look at what options we have:

Creating Azure Service Bus Topic and Subscription

The Max delivery count interests us, and we will leave at 10. Checkboxes “Move messages that cause filter evaluation exception” and “Message lock duration” values are some things we want to keep in mind later on.

Creating Azure Function

We can use a predefined template in Visual Studio to create a new Azure function. We choose the function trigger, in our case, the service bus message, and create a default project. 

Message Flow

Before jumping into the configuration settings, let’s remind ourselves what one message flow looks like.

Message Processing Modes

Two receive modes are available when processing messages in Azure functions: Peek-Lock and Receive-and-Delete mode. By default, it uses Peek-Lock mode. 

Peek-Lock Mode

In Peek-Lock mode, we peek at the message from the topic and lock it so other receivers cannot process it. You can alter the lock period, but as you can see, the default is 1 minute (message lock duration property). During that time, the message broker must process and settle the message. If everything goes well, the broker deletes the message from the active queue.

If we don’t complete it in the lock period, the function will throw LockLostException. Depending on the number of retries (max delivery count in the topic settings * function retry count), the message will end up in the active queue again or in the dead letter queue if it reaches the maximum number of retries.

Although this approach is slower and more complex since we need to tune in more configuration settings, it offers delivery assurance, which is especially important when working with high-value messages.

Receive-and-Delete mode

In the Receive-and-Delete mode, when the consumer receives a message, the broker already settled it. That means the broker deletes the message upon sending it and assumes it will be processed successfully. Since we don’t need to think about settling the message ourselves, it leads to higher throughput message processing.

However, this mode is not suitable for processing high-value messages since we have no delivery guarantee. For example, if the consumer dies while processing a message, it’s lost forever. 

Dead Letter Queue (DLQ)

We mentioned that messages could end up in the dead letter queue (DLQ). Messages automatically move in that queue when a receiver cannot process them.

DLQ recevies messages when: 

  • they exceed the maximum delivery count, 
  • when the message time to live expires, 
  • when the receiver throws an exception while processing the message.

Having a DLQ with valid dead letter reasoning can be useful when we want to identify and debug problematic messages. But also when we want to see what causes the consumption problem. 

Configuration of Azure Functions

Finally, we are ready to look at the host.json file and see what the default settings are:

{
    "version": "2.0",
    "extensions": {
        "serviceBus": {
            "clientRetryOptions":{
                "mode": "exponential",
                "tryTimeout": "00:01:00",
                "delay": "00:00:00.80",
                "maxDelay": "00:01:00",
                "maxRetries": 3
            },
            "prefetchCount": 0,
            "transportType": "amqpWebSockets",
            "webProxy": "https://proxyserver:8080",
            "autoCompleteMessages": true,
            "maxAutoLockRenewalDuration": "00:05:00",
            "maxConcurrentCalls": 16,
            "maxConcurrentSessions": 8,
            "maxMessageBatchSize": 1000,
            "sessionIdleTimeout": "00:01:00",
            "enableCrossEntityTransactions": false
        }
    }
}Code language: JSON / JSON with Comments (json)

You can find a more detailed explanation of every setting on Microsoft page.

But, we want to focus on a couple of settings to see how they can improve our function performance and scalability.

Don’t Rely on Autocompleting Messages in Azure

If we look at the default settings, we can see that the autoCompleteMessages property is set to true by default. That property means that we have no control over managing the message lifecycle. In other words, when the function receives a message, if it executes successfully, it will automatically mark the message as completed (i.e., deleted from the active topic). In case some error happens while processing the message, it will retry n times until it moves it into DLQ.

Let’s say we have a malformed message payload; ideally, after the first run, we would reject it. When the autoCompleteMessages setting is set to true, we have no control over how message completion is handled. Also, when it ends up in the DLQ, it will have a generic reasoning “maximum retry count exceeded“. Hence, we won’t know the real reason why it ended up there. All this leads to less performant functions due to unnecessary retries and DLQs, which are hard to analyze later.

Taking Control Over Message Lifecycle

Changing autoCompleteMessages to false means we need to manage the message lifecycle on our own and not rely on the function to do it automatically. To enable it, we first need to install Microsoft.Azure.Functions.Worker nuget to gain access to the ServiceBusMessageActions class, which we will explain in more detail later. At the time of writing, the used nuget is version 4.2.1.

In general, ServiceBusMessageActions is a class you use as an optional input parameter to perform message lifecycle actions such as completing, abandoning, or dead-lettering messages.

To include it in our function, we need to add ServiceBusMessageActions as an input parameter. Following Microsoft documentation, we need to name it messageActions.

The function input parameters should now look similar to the following:

[Function("MyFunctionName")]
public void Run(
    [ServiceBusTrigger("topicName", "subscriptionName",                Connection = "myConnection")]
    ServiceBusReceivedMessage serviceBusMessage,
    ServiceBusMessageActions messageActions)
{
    _logger.LogInformation("ServiceBus queue trigger function                    processing message: {messageId}", serviceBusMessage.MessageId);
}Code language: PHP (php)

Now, we can call the corresponding methods of this class and react to the messages accordingly.

In the mentioned case of having a poisonous message, we would call the DeadLetterMessage method of messageActions and state the correct reason why we want to dead-letter this message. This removes the message immediately from the active message queue, giving us more time to process other messages. Another benefit is that when analyzing the DLQ, we will know the exact reason why the message ended there.

In contrast to dead lettering, if message consumption was successful, we would call the CompleteMessageAsync method of messageActions.

One more use case can be that we want to release the message lock and put the message back in the queue, in which we would use the AbandonMessageAsync method.

One thing to remember is that in real-life scenarios, we can have more complex conditions to check if we want to proceed with the message consumption. This setting can be a huge help in getting rid of poisonous or non-wanted messages as early as possible.

Go Low with prefetchCount

The prefetchCount setting specifies the number of messages the function will fetch and hold in memory at a time. By fetching multiple messages at once, we can reduce the number of round-trips between the function and service bus topic, improving throughput and reducing latency. By default, it’s a 0.

Now you may think, great, I will put it to some high number and don’t need to worry about anything. In reality, this can slow down functions and introduce some unwanted behavior, and here is why.

First, prefetched messages are locked, which means their locked period starts to tick off, and they are locked for processing by others. If we set it to a high number, eventually, they will start to time out in the prefetch buffer. Here, depending on our retry, messages automatically go to the dead letter queue without processing. It can also happen that one function instance is processing messages at a larger pace and has nothing to process anymore because all other messages are waiting in the prefetch of other function instances.

On how to calculate the prefetch count number, we can take it directly from Microsoft documentation: 

The maximum prefetch count and the lock duration configured on the queue or subscription need to be balanced such that the lock timeout at least exceeds the cumulative expected message processing time for the maximum size of the prefetch buffer, plus one message. At the same time, the lock timeout shouldn’t be so long that messages can exceed their maximum time to live when they’re accidentally dropped, so requiring their lock to expire before being redelivered.

Example of Prefetch Count Calculation

Let’s say we will have 16 max concurrent calls, and our WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT is set to 1 while the lock duration time is 30 seconds. Also, let’s say the average processing of one message will take 2 seconds. In this case, from the documentation on how to calculate the prefetch count, we can extract the following formula:

prefetchCount * executionTimeInSeconds = lockDurationTimeInSeconds + executionTimeInSeconds

With these settings, we can handle up to 16 messages at once, and our processing time is 2 seconds. That means that the executionTimeInSeconds will be 2/16 of a second. 

Following this approach, we get that our prefetch count can be 241.

Tweak Lock and Lock Renewal Duration in Azure

MaxLockDuration

In the calculation of the prefetch count, we mentioned the MaxLockDuration property and how it can expire. In our case, we left this property at the default value of 1 minute. You should adjust this value according to the function performance, concurrent calls, and one message processing time. A general rule would be to set the lock duration to something higher than our normal processing time and keep our prefetchCount setting in mind.

It is important you don’t go too high with this number. So, if the receiver fails to process the message or the message gets stuck in the prefetch, the receiver will keep it there for too long. That can lead to potential receivers being unable to process it, slowing down the overall throughput. 

In case we want to make sure we won’t lose the message if processing takes longer than expected, we can use the maxAutoLockRenewalDuration setting. It specifies the maximum duration for which the lock on a message can be automatically renewed. In general, this value should be greater than the MaxLockDuration property. We should consider other factors that can delay processing time, like:

  • such as network latency, 
  • performance of external services, 
  • dependencies in other systems.

In the case above, where we know our processing time is roughly 2 seconds, we can set the lock duration time to 30 seconds. That’s enough time to process the message and utilize the prefetch buffer.

maxAutoLockRenowalDuration

For maxAutoLockRenewalDuration time, we can set it as two times the lock duration time, so in our case, it would be one minute. That means the auto-renewal will happen once if the lock expires, and after that, the message will be released. As with lock duration time, it is important not to go too high with the number so it doesn’t keep the message too long. On the other hand, we want it set to something longer than the MaxLockDuration. It gives us more time to cover cases in which message processing will be longer than expected.

Maximum Concurrent Calls Scale with Instances

Setting maxConcurrentCalls refers to the max number of concurrent calls to the callback that can be initiated per scaled instance.

What is important to note here is that it refers to one scaled instance. So, suppose we conclude that our function can handle 200 messages per second. In that case, it is not enough to set this number to 200 because if our function is configured to scale on a higher load, that number will be maxConcurrentCalls * scaled instances number.

So, when working with this option, we need to consider the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT setting that defines the max number of scale-out instances.

A good practice is to execute some performance tests on different values of this setting and monitor function behavior.

When determining this number, you also have to remember the prefetch count, resources available on our hosting plan, lock duration, and the time it takes to process one message.

Improving Performance Outside of Azure Configuration

As we mentioned earlier, it’s common to include condition checks in our functions to determine whether we want to process a message or not. One approach is to set autocompletion to false and immediately dead letter messages. But, if you’re working with simpler conditions, a better approach is to use topic filters and custom message header values.

Custom Message Headers

Custom message headers in Azure Service Bus messages are user-defined properties we can add to a message to provide additional metadata or context. These headers are stored as key-value pairs. They store various data types, such as message routing information, priority, or custom application-specific data. To add custom headers, we include them in the ApplicationProperties property of the send message.

Since it is a dictionary, when we access it in the receiver application, all we need to know is the key of the value we need.

All this enables us to quickly access information about the message, perform checks and filter them out based on that data. It’s particularly useful when working with larger messages stored with a claim-check pattern since it allows us to eliminate unwanted messages early on.

Topic Filters

In addition to custom message headers, Azure Service Bus also supports topic filters, which you can set when creating a new topic. When creating a topic, we can select the “Move messages that cause filter evaluation exception” checkbox, determining whether it stores the filtered-out messages in the DLQ. Enabling this option is a good practice if we want to examine why it filtered out certain messages. 

Topic filters help us reduce code complexity since we don’t need to write custom validation logic inside our function but instead rely on the built-in Azure functionality. This approach can also reduce network traffic and processing overhead, resulting in better performance and scalability for our application.

Wrap-up

Now we have seen a lot of potential settings and options we can tune in when working with service bus-triggered Azure functions, so let’s take a look at our modified host.json file with which we have started: 


  "extensions": {
        "serviceBus": {
            "clientRetryOptions":{
                "mode": "exponential",
                "tryTimeout": "00:01:00",
                "delay": "00:00:00.80",
                "maxDelay": "00:01:00",
                "maxRetries": 3
            },
            "prefetchCount": 241,
            "transportType": "amqpWebSockets",
            "webProxy": "https://proxyserver:8080",
            "autoCompleteMessages": false,
            "maxAutoLockRenewalDuration": "00:01:00",
            "maxConcurrentCalls": 16,
            "maxConcurrentSessions": 8,
            "maxMessageBatchSize": 1000,
            "sessionIdleTimeout": "00:01:00",
            "enableCrossEntityTransactions": false
        }
    }
}Code language: JavaScript (javascript)

Do we need to tune all of these configuration settings?

As with everything in programming, it is a good practice to be aware of how we set things up under the hood. Knowing this and following some general guidelines in this article and other resources can greatly improve our function performance and scalability.

Ready for another step-by-step? Check out how to how to build support for caching with Redis via annotations or how to combine Gradle and Go plugins.

CONTACT US

Exceptional ideas need experienced partners.