Ok. So we’ve found some problem in a VM that doesn’t do any network stuff on our side, but it certainly worked on the other side. How could it be?

The software we’ve worked on contained a bunch of windows services set to start automatically on Windows startup. Almost all of them initialized a heavy infrastructure that were absolutely critical for the whole application. When we started the client it tried to connect to a web-service hosted in one of the windows services and failed to do so.

After some time of debugging it turned out that windows services do have dependencies on each other, in terms of RPC calls.

We’ve looked in the whole list and found that one of the services didn’t started at all. We’ve checked it start settings and it was set to Automatic, but it was stopped. We’ve tried to start it manually and it failed to start.

When we looked into the code we have found it to make most of the heavy lifting right in the OnStart method which Windows expect to complete shortly; and the code inside OnStart used to run quickly… until it was executed on a VM-server under heavy load of several VMs.

The standard behavior for Windows is to kill service process if the services didn’t respond in a timely fashion. In other words the difference in server load revealed unstableness of the code that used to work fine on clean hardware.

This seems quite interesting, as it seemed safe to pass preconfigured VM to show some preview, but it turned out that even if the VM works good on your side, it can be broken on the other side.