Dashboard wrongly shows zero servers when debugging for more than 5 minutes

To me, this is a bug and one that I initially found alarming, but after some investigation have concluded that 1) it is such a contrived manifestation, and 2) it won’t occur in production. IMHO the bug is so negligible I haven’t raised it as an issue in GitHub, but still thought it should be documented in case others manage to hit it.

Steps to reproduce:

  1. Using Visual Studio, insert a breakpoint in your application code that a background job will call and begin debugging.
  2. Queue a background job to hit your breakpoint.
  3. Wait on that breakpoint for more than 5 minutes.
  4. Continue.
  5. View Dashboard and confirm Servers is zero.
  6. Queue more background jobs.

Expected: Jobs are not processed.
Actual: Jobs are processed.

I’ve experienced this twice so far, both times when I’ve been interrupted during debugging, having returned to my work and been alarmed to see there were no servers and the big scary message saying “no background tasks will be processed”. It created niggling doubts around the stability of Hangfire and not knowing why the server had gone down. Even more puzzling, was the fact that “something” was still processing jobs, even though the Dashboard tells me there are no servers!

After some quick (and possibly incorrect) investigation, I think I see what is happening here. I can see that servers save a regular “heartbeat” and there is a default timeout to remove servers if a heartbeat hasn’t been seen in 5 minutes. Since a breakpoint will prevent any code from running (including sending heartbeats) if you wait long enough, then the server will be removed from the database and UI, but the server code remains in memory and running (once the breakpoint is released) so it can still process jobs.

Potential fixes

Given the low impact of the bug, it might not be even worth fixing, but I can think of two potential fixes:

  1. Override the default timeout, e.g. make it an hour. You could still hit the bug if you waited long enough, and I’m not sure if it’s a good idea to extend it if you have actual server connection problems.
  2. Allow a heartbeat to repopulate the Servers table. This would be a change to Hangfire so kind of a major.

Bonus information

  1. Thread.Sleep()'ing for more than 5 minutes doesn’t trigger it (expected as processing is released so that heartbeats have a chance to be sent).
  2. Spinning in a while loop for more than 5 minutes doesn’t trigger it (in this case I expect that thread switching is still occurring to allow heartbeats to be sent).

If the server is totally unresponsive to heartbeats for an extended period of time, it’s reasonable to remove it from the server list. You, as the debugger, may know it’s not dead, but there’s no way for the Dashboard to know that it’s not really dead.

I’m gathering from your #2 comment that hitting Continue in Visual Studio doesn’t cause the server to get added back into the list?

Yes, I agree, that is all perfectly reasonable.

Correct, and this is the essence of the issue: the Dashboard is misleading in this sense.

1 Like