Intro & Motivation

Most people probably already know about clustering, wherein individual Elixir nodes can be connected. The portal tutorial is a good reference that begins to show the possibilities for distributed elixir. To quote a simpler, less involved demonstration from stackoverflow the basic functionality works like this:

iex --name node1@127.0.0.1 --cookie a_cookie_string

# In machine 2:
iex --name node2@127.0.0.1 --cookie a_cookie_string

# To test it, you can do something like this, on machine1:
iex(node1)1> Node.connect :"node2@127.0.0.1"
true

iex(node1)2> print_node_name = fn -> IO.puts Node.self end
#Function<erl_eval.20.80484245>

iex(node1@127.0.0.1)3> Node.spawn(:"node2@127.0.0.1", print_node_name)
node2@localhost
#PID<7789.49.0>

Auto-Clustering

Unfortunately without some more magic, nodes don't register and connect to each other automatically. There are quite a few ways to accomplish this, and best practice isn't clear.

Evaluation

This section describes a few conclusions based on piloting each of the approaches above.

  • Explicit node clustering requires a set list of node names, and while it does have the advantage of "native" support, this isn't very dynamic for general purpose clustering. However, this approach is perfect for apps that can be broken down into a few specific components, say "one node running a display loop, one node running an iex shell".
  • Using discovery lib + consul is probably the most robust thing for production, but I haven't tried it. For simple projects, adding a consul dependency is not ideal.. and similarly for kubernetes.
  • After a brief period of experimentation with non-kubernetes swarm, I noticed some very odd behaviour where CPU/memory usage of beam1 would climb until it nearly locked up my computer. More mysteriously, this actually consistently disrupted my internet connection, perhaps because the multicasting was spiraling out of control. Obviously I probably didn't have everything setup correctly, but it was annoying debug and I abandoned this approach.
  • OpenSLP + ex_slp lib based discovery is also not ideal.. in particular the non-elixir moving parts are old(ish) and while it's at least stable it is not clear that the protocol/daemon is actively maintained. I also can't vouch how well OpenSLP really deals with CAP challenges, whereas things like consul have explicit multi-datacenter support and are known to work well in the face of partitioning. Nevertheless I found OpenSLP to be the simplest approach

Winner

I'm mostly interested in quick-and-dirty-yet-flexible clustering for experimental projects. Since the whole clustering problem is normally just a sideshow that's preventing me from getting on with the project I had in mind.. for now the the clear winner is ex_slp, and I've used OpenSLP based discovery for instance in my side projects I've written about elsewhere. 2

Implementation

I'd like to eventually cover implementation specifics for the other approaches if I ever dive deep into them, but for now I'll only go into details about ex_slp.

OpenSLP Setup

There's an ubuntu package for slpd and slp-tool, and there's also a docker image that has both. To install and verify installation for debian based systems use something like what you see below.

$ sudo apt-get install slpd slp-tool
$ sudo /etc/init.d/slpd restart
$ slptool --version

You'll want to install the command line tool regardless, but it's possible to use the daemon via docker:

# Run slpd via docker and background it
$ docker run -d -p 427:427/tcp -p 427:427/udp  --name openslp vcrhonek/openslp

Code

[gist 580e04631af195edae0c9157f47758fe]

  1. the erlang VM
  2. see for example the ambient calclulus project