Possible memory leak when running hangfire in linux container

Hi everyone!

We are running Hangfire 1.8.14 with Hangfire.PostgreSql 1.20.9 on .NET Core 8.

It seems that there is a memory leak that only occurs when running it in a linux container deployed to Azure container apps.

When run locally we did not observe the behavior.

Currently we have the following projects:
API - with the endpoints used to enqueue new jobs,
HangfireServer - ASP.NET Core app where Hangfire is hosted and,
Jobs - a seperate project/dll that contiains the classes with methods that represent the jobs.

The API only calls IBackgroundJobClient.Enqueue() and does no further processing.

So for example I have a simple class:

public class KillMemoryJob
{
    public async Task Run()
    {
        long bytesToAllocate = (long)(1.5 * 1024 * 1024 * 1024);
        byte[] memory = new byte[bytesToAllocate];

        for (long i = 0; i < memory.Length; i++)
        {
            memory[i] = 0;
        }

        Console.WriteLine("Successfully allocated 1.5 GB of memory.");
        await Task.Delay(1000 * 60 * 5); // wait five minutes to finish
    }
}

that allocates 1.5GB of memory.

The API endpoint is defined as:

app.MapGet("api/debug/kill-memory", 
    (IBackgroundJobClient backgroundJobClient) =>
    {
        string jobId = backgroundJobClient.Enqueue<KillMemoryJob>(zbj =>
            zbj.Run());

        return Results.Ok(jobId);
    });

The behavior I expect to observe is that the API enqueues the job, the server takes it and starts processing. After the processing finishes and the Run() method exits (after a five minute delay), the allocated memory is released.
That works as expected when I run it locally on my Windows machine, but when deployed to Azure Container Apps the memory never gets released.

I can provide a dump file if that would help figure out the issue.

2 Likes

I have the exact same configuration, and I’m facing an issue with memory release in a Linux environment. Locally, this problem doesn’t occur. After analyzing the memory dump, I found that the leak happens because TimerQueueTimer objects are not being disposed of. If you make progress and find a solution, please let me know :v:

1 Like

Watching this issue too, as we are also experiencing this issue when running hangfire on a linux container in Azure, only difference is we’re using Hangfire.SqlServer.

Hangfire version 1.8.14, Hangfire.SqlServer 1.8.14 on .NET 8.

Bumping this thread up and hoping for a reply or a resolution.

1 Like

It is happening to me as well, running on linux container, arm64 and using In-memory storage

1 Like

Need more details here, like dotMemory output, to see what types are leaking and what are the origins. Can also be related to .NET 8.0 itself, since there are similar issues reported:

Hi,

Thanks for the reply and attention. I will try to get those memory snapshots. But one thing is common, they’re all okay on windows, just happens on linux containers.

So i believe the issues you mentioned aren’t related. We’re running other .net 8 apps on linux containers, but they aren’t building up memory over time.

1 Like

Hangfire itself doesn’t have Linux-specific code, so we should understand what component is responsible for this behavior. Unfortunately it’s unlikely we can succeed with this without any additional details like types, etc. Another metric of interest is the type of memory that’s being overly allocated – managed or native.

1 Like

Hi everyone!

I analyzed memory dumps and after upgrading Hangfire to version 1.8.17, I noticed that the number of TimerQueueTimer objects decreased. In the roots of these objects, it became clear that the issue was related to the repeated creation of a logger inside jobs, instead of using a single logger for the entire application. I recommend upgrading Hangfire and re-analyzing the memory dumps, because before the update, the roots of TimerQueueTimer objects pointed to the PostgreSQL connection used by Hangfire, and I wasn’t paying attention to the logger.

Additionally, this issue was not observed on Windows. It seems that Windows handles hanging objects in memory better, which is likely why the problem didn’t appear there

Hm, timers were in use in Hangfire in the past, but now it doesn’t use them, to ensure stable processing under .NET Thread Pool contention. Recent changes indeed reduced allocations in some common paths, but I don’t remember anything timer-related here. On the other hand, storage implementations may use them for some things, so their package updates may affect this theoretically. So if you still see such issues and can share more details about types that are heavily allocated / not being freed, I would be happy to investigate that.