I’m convinced our own code is the problem and causing the worker to lockup. I was hoping that might be some worker monitoring options to recover stuck workers.
This does make my other question more relevant, do you have recommendations on code that runs when en-queued? If you can make some suggestions?