We are using and created Hangfire as windows service. We have a UI that submits a job which is handled by this service.
Here is the scenario:
If the windows service is down and we submit the job from the UI the status changes to pending state. Now if I see the hangfire tables:
Hangfire.Job table shows the job to be scheduled.
Hangfire.State table shows as scheduled but the reason column says “Retry attempt 1 of 10: Could not load file or assembly 'windows service”
If we turn our windows service on:
Hangfire.State table reason column keeps on changes as
“Retry attempt 2 of 10”…
“Retry attempt 3 of 10”…
And on the hangfire dashboard the enque time keeps on increasing. And sometimes after few attempts the job runs and turns to be succeeded and sometimes it just gets stuck.
Suggestion:
If your Hangfire instance tends to fail I’d suggest checking if it is alive before or after execution, you can achieve that via custom filter that would implement IServerFilter interface.
Question?
After you turn the Hangfire winsvc back on what kind of errors do you receive in retry attempts, what fails, or it is always the same message: “Could not load file or assembly…”
“checking if it is alive before or after execution”
By this do you mean checking if the the windows service is stopped or started?
We are just submitting job via UI using BackgroundJob.Enqueue
When I start my window service:
Its keeps throwing the same message:
“Retry attempt 5 of 10: Could not load file or assembly 'MyService…”
and so on upto 10.
Also after the 10th attempt it changes to failed
Can not change the state to ‘Enqueued’: target method was not found.
Sorry, nevermind my first question, my mind was somewhere else.
Is the missing assembly a reference in Hangfire winsvc, or are you loading it dynamically somehow?
Could that dll be used by something else that is file locking it, so it is unavailable to Hangfire winsvc?
Sorry doing a lot of guesswork, but I am trying to understand your setup.
The missing assembly is one of the assembly (performs some get action from db) which is a part of the windows service. We call the enque on hangfire on one of the button click event on the UI. I cant seem to think any other process locking it. May be it could happen that when we click the first time it creates a job, then when someone else also creates another job at the same time it locks it up and throws the error. Can we avoid it by putting any check ie if a job is already in processing state then dont start processing the other job etc.