Issues with Sitecore, Hangfire and system stability

thj · May 31, 2021, 10:09am

Hi. We’ve been running Sitecore 9.02 and Hangfire for about 3 years in production.
The infrastructure is hosted in Azure - Webapps and Azure SQL.
The setup has been more or less stable the entire time. We’ere processing around 200k-500k jobs every day.

We’ve recently encountered a huge issue, we’re the server completely hangs on startup - shutdown/startup is done at least once during deploy. This behaviour seems to happen when we have over 100K jobs enqueued. No amount of restarts will fix it. I can only get the server up again if the hangfire-database is truncated or if I create a new SQL-instance.
The .NET profiler reveals some Hangfire threads being in Wait.One state. I assume those are expected and come from the workers waiting for news jobs to process.

We also have instances where then environment is running fine for a couple of days. When I come back to it we might have 2-3 mio jobs enqueued and a number of jobs in the processing state - where they might have been in the last hour, even though this particular job only takes 300ms to run.

We have looked at Server and SQL utilization and run the various diagnostics tools in Azure. None of them seem to reveal the issue. It doesn’t make any difference scaling the infrastructure.

To me it seems that the infrastructure have problems shutting down/releasing cpu-threads or something. Something is not being shutdown gracefully so to speak.

Hangfire is configured in the owin initialize pipeline. See attached screenshot. Its not “pure” owin as Sitecore has a layer on top of it.
I’m wondering if it’s this particular way of initializing the system and the shutdown-sequence, that are giving us issues.
In other words: It could be that the combination of Sitecore and Hangfire is the issue (not Hangfire in itself).

Do any of you guys have experience with this combination of Sitecore, Hangfire and Azure? Do you have any tips where we should be careful to configure Hangfire in a certain way?

Topic		Replies	Views
Migration .net 3.1 core -> .net 7 - makes hangfire hiccup bug? sql-server , aspnetcore	1	304	December 19, 2023
HangFire stops for some reason question aspnetcore	5	4588	September 12, 2022
Hangfire server not starting itself after deployment question recurring	1	2172	August 2, 2023
Hangfire stuck after SQL exception question sql-server , aspnetcore	3	2512	December 15, 2020
System hanging with more servers question	1	154	May 18, 2024

Issues with Sitecore, Hangfire and system stability

Related topics