Today almost every project involves some sort of distribution: interaction with some legacy system, some sort of authentication service, etc. Teams tend to solve the distribution problem by applying easily understandable RPC-style technologies. Not that RPC-style communication broken by definition, but it tends to make some very strong assumptions on the channel and the availability of the remote component.

The implicit assumptions

From developer perspective RPC looks very attractively: the remote object looks just like it it was on the same machine - the undelying technology takes the responsibility for serializing requests and deserializing method returns back. Just like calling a local object. To support this model of interaction implicitly the following assumptions are made:
  1. The remote server is up and running
  2. Remote server is listening
  3. Remote server will return meaningful answer
  4. Remote server will return the answer in reasonable time
While accessing local objects it's OK to make those assumptions, but in the distributed enivronment with network interaction involved this type of communication became very fragile.

When something goes wrong

Imagine the remote server is restarting due to critical update being installed on it, or crash, or misconfigured network device in the middle of the channel. In all those (and much more other) cases this will result in the calling side throw an exception either due to inability to create a communication channel or a timeout.

Most of technologies which back RPC-style communication advice to use an exception handling block when remote calls are involved. This makes using RPC a bit more complicated than how it seems at first look.

The next question is waht to do with the exception caught this way?

It's possible to classify the problems that could occur with calling a remote component at least in two types: temporal (remote server restart, database deadlock, etc), permanent (deserialization/versioning issues, misconfigured infrastructure). Temporal ones could be solved by making another attempt after some time, while permanent would not go away without external intervention (the operations team, etc).

In most cases trying to make another attempt with RPC looks awkward and breaks the beauty of OO-like code that RPC tries to maintain.

In rich-client app you can at least show the exception to the user to indicate that some problems appeared, or immediately retry the request. On the server the exception will most likely be dumped to the log, and may be propogated back to the caller, at some cases the maintenance personnel could be informed, but what to do next is not obivous.

In server-server environment RPC tends to harm more. Servers usually have a limited pool of threads that are used for serving requests. When no threads are left in the pool, incoming requests are enqueued for later processing. While in dev environment remote calls are cheap in production remote component could answer very slowly (due to a requests spike for example) making the calling thread hang for a timeout period and decreasing availability.

Circuit Breaker

One possible solution to the problem is usage of the Circuit Breaker pattern (as described in Release It!). Circuit Breaker is a component with three states: Closed, Open and Half-Open. CB intercepts all calls to remote RPC endpoint and tracks the result of the calls.

By default CB tends to be in the Closed state in which it tracks the number of failed requests and if it became greater than some limit CB switches to the Open state.

In the Open state CB throws Fail Fast exception even before the thread starts to serialize the request, effectively interrupting the call. After some configured time CB switches to Half-Open state.

In Half-Open state CB will allow the next request to reach the remote endpoint and will make its transition depending on the results of this call. If the call will be successful it will switch to the Closed state, otherwise will got back to Open.

CB is a sort of throttling mechanism, it doesn't solve the problem completely, but rather mitigate it.

Note: Circuit Breaker usage is not limited to RPC and can be used with any unreliable remote component (File System, DB, whatever).

Conclusion

After some reflection it could be found that RPC is not as easy as it seems, especially in exception handling.

In short: use RPC only if the remote component is absolutely needed for the calling component to complete the task. RPC tightly couples the availability of the caller to the availability of the remote components used by it and as such should be used with caution.

Use Circuit Breaker to mitigate scalability issues.