Nuget Pro server up and down

The private nuget server for hangfire Pro is often returning connection reset, possibliy one of the server nodes is down, calls seem to succedd about 50% of the time https://nuget.hangfire.io/nuget/hangfire-ace/v3/index.json

Thanks Cameron, investigating. Looks like something wrong with network, app layer and resources are perfectly fine.

I see that problems started about 10 hours ago. I got 2 notifications from the external monitoring system regarding connection timeout issue (10 seconds), but I didn’t see any problems with resources (CPU/memory/disk), server or software so wasn’t able to recognise this as something continuous.

Only after you told about the problem I checked another monitoring tool to see the whole picture, so thank you again for reporting.

There was a pending Windows Update, but restart didn’t help to eliminate the issue, and responses were still taking too long to be completed. Hosting reported there are no known network-related problems, so I decided to double-check the server itself and tried to disable the monitoring agent software, since sometimes checks that everything is good make everything worse (looks like software lives in a quantum world).

And looks like that was the reason, because after disabling the agent everything was started to improve, e.g. metrics started to show more succeeded and less failed requests. Now I’m waiting the end of hour to finalise the investigation and try to upgrade the monitoring agent software to see whether the problem is gone, but this time I will notice that something is bad early and disable it again.

I’ve also enabled notifications about degraded performance (I was unaware of this feature and it was disabled by default). Unfortunately high availability setup was destroyed in Mar 2022 after of the hosting provider that I was required to do, and I will be restoring it gradually.

Updated monitoring agent 1 hour ago and started it, the metrics look fine currently. Will keep an eye on them.

