The idea is to have your job raise an exception. It will then go into the failed state and depending on your AutomaticRetry setting attempt to rerun the job if needed automatically (up to the defined number of retry attempts) or stay there so that once the problem is solved you can manually requeue the job from the dashboard.
Having the job sit there waiting for a service to come back online does not sound advisable (speaking in general, I obviously don’t know your specific scenario that well).
On the whole I find I am even extremely careful of even doing automatic retries. I only even consider doing those if I have a guarantee that whatever the job does is idempotent (i.e. running the same actions multiple times does not cause issues).
Imagine a job that adds 100 $ to the salary of every employee in a company (i.e. set salary = salary+100). You run the job updating the DB but halfway through the DB server connection drops. Half the employees have had the salary increase, the other half did not get it yet. Running the same job again should not apply the 100$ increase a second time to those employees done in the first run.
Stopping the whole server also seems a bit drastic. I believe the advised mechanism is to just delete the job if you don’t want the job (if it is a recurring one and not a fire and forget) to enqueue new runs for a while. Then when the issue is solved you just reschedule it. I do agree that a pause feature would be a nice to have. You could extend hangfire yourself to do this using the jobfilters and IElectStateFilters. Just have a boolean (i.e. IsHangfirePaused=true) somewhere that you can check in the OnStateElection event and prevent the job from transitioning to the EnqueuedState when it is set to true.