I have Hangfire running via IIS in a 4 worker web garden. After a while, the CPU usage spikes and stays consistently high, eventually hitting 100% and virtually freezing the server. Hangfire isn’t showing any active jobs, however. I know it is hangfire relat3\ed because I can disable the hangfire startup and the problem does not occur.
But I’m not sure where to start diagnosing what is going on. Any advice would be great.
Try to use a profiler – it will eliminate almost any guessing here. I prefer dotTrace profiler from JetBrains, I believe it’s the most user-friendly profiler for .NET and powerful at the same time.
Ok. So I downloaded and ran dotTrace. I am new to the tool so forgive my obvious question. I see the a lot of time is spent in wait mode… but not sure why that would drive CPU usage? RIght now that particular worker is using nearly 50% cpu but doesn’t seem to be doing anything. I pasted a screenshot here.
Did you ever resolve this problem? We’re also having the, at least apparently, same issue. It only happens in our production environment, and rather erratic, so we haven’t been able to reproduce and therefor debug it.
The problem has gone away on its own (at least for now). I did pull one job off of hangfire that was pinging a series of printers which might have been a contributor to the problem (not sure why when I look at the code) but in all truth I have really no idea. It just stopped being naughty.
Ok. So I downloaded and ran dotTrace. I am new to the tool so forgive my obvious question. I see the a lot of time is spent in wait mode… but not sure why that would drive CPU usage?
WaitAny method first uses a spin lock before transitioning into the kernel wait, so theoretically it can cause some cycles to be burnt on a busy wait. However in practice this will have an effect only if we are constantly calling the WaitAny method again and again. To avoid such a behavior, the AutoResetEvent is used in Hangfire.
Repeated calls may also appear, when a lot of jobs are added to the queue, however in this case there will be a lot of jobs to be processed anyway.
We finally solved this. The issue was not in WaitAny, but rather the System.Threading._IOCompletionCallback stack.
We injected a WCF client using dependency injection, and it seems that somehow when things went wrong they weren’t disposed (which was done during garbage collection), and then stuck in a faulty state or something.
Removing it from dependency injection and more securely disposing the client after each use solved the problem.