Building Highly Available Systems in Erlang
Joe Armstrong gave an interesting and entertaining talk about the principles behind Erlang. The original slides are available here and the talk will be soon online on the InfoQ website. Especially enjoyed a few bits from his past working at satellites software and Ericsson. Here’s my summary, enjoy.
High availability (or HA) is quite difficult to achieve in practice. 10 nines availability, or 99.99999999% uptime is not that far away in the future and Erlang is designed just to achieve that goal. HA is very close to our everyday life than you expect: the washing machine runs HA software, pace makers, aircraft control systems of course. But even if the effort made is huge, there is always the possibility a cosmic ray hitting the circuit board ad flipping a couple of bits.

http://www.erlang-solutions.com
Something that wasn’t a problem a while ago is internet HA, where “internet” means networked applications. We want to take advantage of this possibility, that is, to have data redundancy for HA on multiple nodes and the computation to happen in any of those nodes. For a washing machine for example, data and computation are in the same place. Availability increases with multiple distributed copies of the same data because it lowers the probability that all data copies are corrupted at the same time.
If we bring the data redundancy strategy to the extreme using as many as possible nodes, there are other problems to solve. Here’s a possible scenario: a system with 10 millions “cores” (machines or CPUs) and 10 copies of the data. How to find those data? The Chord address protocol solves this issue. An md5 of the ip address of all machines is created and sorted. The key id of the stored data is hashed as well and that value is then compared with the values of the hashed IPs. That copy of the data will be stored on the closest node by lexical ordering. A replica can be stored by hashing again the key as many times as I want. The probability of 5 copies of the data failing at the same time is 10 square -10, that is 0.00000000001 or our ten nines availability. Data HA problem solved. But what about the availability of the computation?
The key enablers for HA computation can be summarised in six laws.
Rule #1: isolation
A failure in a module should not affect other modules. Sometimes a dependent module hides a circular dependency you should be aware of. For example if I put my data on both Dropbox and S3 I should be aware that if S3 goes down also Dropbox that depends on it goes down. Two modules of an application, one based on Dropbox and the other on pure S3 are then not isolated.
Rule #2: concurrency
The world is not sequential, is embarrassingly parallel. Modelling such a world with imperative languages is difficult. Also consider that we need a system with at least two computers to make it a non-stop system.
Rule #3: must detect failures
A defect cannot be fixed if the system doesn’t know anything about it. For example, to determine if it was the machine or the communication channel that died. It implies distributed error handling, since at the time you collect data about the error those data must be handled by a non-faulty machine.
Rule #4: failure identification
Implies that you need to collect all required information about the failure in order to do something about it.
Rule #5: live code upgrade
Of course there is no point in stopping a server to fix it when what you’re trying to achieve is HA. The AXE-10 telephone exchange system HA requirement was no more than 10 mins downtime per year.
Rule #6: stable storage
When the system stores data, it is forever. Data is distributed accordingly to avoid data corruption. There shouldn’t be any external backup required.
Just to give some more context, the 6 laws of high available systems aren’t completely new ideas. You can find analogies in old papers, for example Jim Gray from Tandem computers in “Why computer stops and what should be done about it” of 1985. Erlang is not that different than a Tandem architecture written in software. The general principle is that system should be designed for failure and not completely in the direction of preventing failure. For example, I don’t bring with me a defribillator because of a potential heart attack in the middle of the street and I don’t try open heart surgery on myself. What I care is that a team of well instructed doctors can arrive on time equipped with the right tooling to save my life. Emergency doctors are trained to deal with human failure despite a great deal of effort is also put on prevention. Another important takeaway from the paper is to address the failure as fast as possible because the more you wait the more it gets worse.
Software modularity should be achieved through isolation and messages. Messages are very important at the point Alan Key created SmallTalk, a language based on message passing between objects. But unfortunately the part for which SmallTalk is so best known are Objects and Classes. Alan Key was frustrated by this, as he reported on the Squeak mailing list
The 6 laws can be applied as a library or as part of the language itself. A library cannot be that effective if the language doesn’t support the 6 laws natively. If for example rule number #1 “isolation” is implemented in a language that only allows an application to run inside a single process then it becomes programmer responsaibility to orchestrate multiple processes. Of course Erlang was built with these principles in mind. Let’s review Erlang features accordingly to the six rules.
Rule #1: Erlang processes are isolated and cheap. It takes 360k of memory for a new process. It’s more complicated to handle failure when pieces of the same application share same memory space. We are expecting a lot more of miniaturization, up to 1000 cores on a single FPGA which means around 100 millions processes all running on the same chip.
Rule #2: Erlang processes are concurrent. Message passing between processes is concurrent. Every process comes with a mailbox that can be inspected regoularly, but the check doesn’t stop the process from doing other things.
Rule #3: Erlang processes handle failure. The mailbox is not used for error propagation, signals are used. A process can catch the signal and do something about it. If the signal is not trapped the monitoring process will exit as well. At this point becomes essential to incorporate as much possible information for the trapping process to handle the error.
Rule #4: A classic problem in parallel computing is how to stop a cluster of running processes when something goes wrong. This is why you want some of them to be special error handlers using the trap signal mechanism. The Erlang OTP provides sensible defaults to handle errors. Riak, CouchDB, Facebook (the Erlang based chat) all uses mechanism like this one.
Rule #5: Thanks to modules Erlang can be upgraded as it runs. When you invoke a function the module it runs in is by default the last version. If you deploy another version of that module, the current code will starting running the old version unless it is preferred otherwise. When Erlang was invented memory was expensive, so at the beginning only a last version and an old version were available at the same time. Nowadays you can select amonsgst different versions.
Rule #6: Mnesia is the database included with Erlang OTP. That is the primary Erlang offer for data that persist forever. Or you can use one of the many good quality 3rd parties (CouchDB, Riak).
What’s beautiful about enforcing all the six rules is that you get an incidental HA system as a result. Erlang is looking at the HA problem considering failure just a special case of success. Concurrent computation becomes like finding islands of stability and move from one stable state to the other as the system progresses.
2 months ago

.
