Intro & Motivation
Most people probably already know about clustering, wherein individual Elixir nodes can be connected. The portal tutorial is a good reference that begins to show the possibilities for distributed elixir. To quote a simpler, less involved demonstration from stackoverflow the basic functionality works like this:
iex --name email@example.com --cookie a_cookie_string # In machine 2: iex --name firstname.lastname@example.org --cookie a_cookie_string # To test it, you can do something like this, on machine1: iex(node1)1> Node.connect :"email@example.com" true iex(node1)2> print_node_name = fn -> IO.puts Node.self end #Function<erl_eval.20.80484245> iex(firstname.lastname@example.org)3> Node.spawn(:"email@example.com", print_node_name) node2@localhost #PID<7789.49.0>
Unfortunately without some more magic, nodes don't register and connect to each other automatically. There are quite a few ways to accomplish this, and best practice isn't clear.
- Explicit node clustering, using erlang flags.
- Discovery with consul, via discovery lib
- Discovery with or without kubernetes, via swarm lib
- Discovery with openslp, via ex_slp
This section describes a few conclusions based on piloting each of the approaches above.
- Explicit node clustering requires a set list of node names, and while it does have the advantage of "native" support, this isn't very dynamic for general purpose clustering. However, this approach is perfect for apps that can be broken down into a few specific components, say "one node running a display loop, one node running an iex shell".
- Using discovery lib + consul is probably the most robust thing for production, but I haven't tried it. For simple projects, adding a consul dependency is not ideal.. and similarly for kubernetes.
- After a brief period of experimentation with non-kubernetes swarm, I noticed some very odd behaviour where CPU/memory usage of beam1 would climb until it nearly locked up my computer. More mysteriously, this actually consistently disrupted my internet connection, perhaps because the multicasting was spiraling out of control. Obviously I probably didn't have everything setup correctly, but it was annoying debug and I abandoned this approach.
- OpenSLP + ex_slp lib based discovery is also not ideal.. in particular the non-elixir moving parts are old(ish) and while it's at least stable it is not clear that the protocol/daemon is actively maintained. I also can't vouch how well OpenSLP really deals with CAP challenges, whereas things like consul have explicit multi-datacenter support and are known to work well in the face of partitioning. Nevertheless I found OpenSLP to be the simplest approach
I'm mostly interested in quick-and-dirty-yet-flexible clustering for experimental projects. Since the whole clustering problem is normally just a sideshow that's preventing me from getting on with the project I had in mind.. for now the the clear winner is ex_slp, and I've used OpenSLP based discovery for instance in my side projects I've written about elsewhere. 2
I'd like to eventually cover implementation specifics for the other approaches if I ever dive deep into them, but for now I'll only go into details about ex_slp.
There's an ubuntu package for
slp-tool, and there's also a docker image that has both. To install and verify installation for debian based systems use something like what you see below.
$ sudo apt-get install slpd slp-tool $ sudo /etc/init.d/slpd restart $ slptool --version
You'll want to install the command line tool regardless, but it's possible to use the daemon via docker:
# Run slpd via docker and background it $ docker run -d -p 427:427/tcp -p 427:427/udp --name openslp vcrhonek/openslp