Hangfire jobs run successfully, but do nothing


#1

I have a strange issue. We have Hangfire deployed to several of our environments. It works 100% perfectly in our Production environment. However in our UAT environment when we enqueue a job, the Dashboard reports success, but the job (which should take around 10 minutes to complete) completes “successfully” in milliseconds. When I check for any of the SQL database changes that the job should make, I can see it has in fact done nothing, despite reporting successful execution.

I know it’s not a code issue. I can point my UAT code-base at the UAT SQL Server and run the whole job through in debug and it enqueues the job correctly and then runs the enqueued job 100% perfectly. So I have a situation where:

  1. a job enqueue works locally in debug against my UAT server.
  2. the same job works in a deployed copy of the application on my Production server
  3. yet that same job reports “success” but does nothing on my UAT server.

It feels likely to be a UAT server config issue, but I don’t know where to start looking on that front. Any suggestions (or maybe it’s not a server config issue)?


#2

Could you show us your code for the job your running? Or explain what it does? Is it just running SQL query?

You haven’t got like a try/catch inside your job or something? Only thing I can think of is try/catch without doing a throw or creating an exception which is suppressing the error in Hangfire. So, for example, you’ve missed a SqlConnectionString or your firewall on that SQL server is been blocked from the IP your connecting on. Or could be file permission issue if your reading/writing any files etc.

What are you running Hangfire as? Windows Service? Website? Or something else?

It’s really hard to say without looking at your code. Hoping this is somewhat useful for you.


#3

Thanks for your reply - I did consider posting code samples but since it’s a copyrighted commercial application I wasn’t sure if my company would frown upon it! I’ll have a word and see if there’s an official policy. However, you’ve given me a really useful hint already - we do have some try/catches in the code that’s being executed. I’m going to start by removing those and see what (if anything) bubbles up. To borrow a phrase, I’ll be back (once I have an answer one way or the other)!


#4

It’s taken a while (a month!) but I do now have an answer on this one. Thanks Spinks90 - your suggestion was a major step in uncovering the issue. I stopped the Hangfire job failing gracefully and found that it was bailing on a try/catch. The issue was a TLS handhsake issue. This turned out to be a server config problem (I’m waiting to hear from my IT team exactly how they resolved this since it’s not really my area of expertise), but the Hangfire job is now working. And when it doesn’t work it errors properly and results in a useful error on the dashboard and the appropriate number of retry attempts.

Thanks again!


#5

catch {} is the bane of our existence.
Definitely make them at least log the error so you can read what happened in logs.