While it is true that Elixir facilitates scalable, distributed services, it is also true that "there ain't no such thing as a free lunch." Elixir on a box will scale vertically until a core is fully utilized and horizontally across cores, but then what? There is a big difference between just scaling horizontally across multiple boxes and actually clustering the nodes that live on those boxes.
When scaling across multiple boxes without clustering there are a couple of notable concerns. First, how does the service need to scale? Is manual intervention being required ok? Is automation necessary? Second, how quickly can I scale up? Finally, how costly is it to scale? Typically providers that manage more for you and make development and operations simpler are going to be more expensive initially and in turn when scaling. Whereas if the time and knowledge investment can be justified, or has already occurred, a more hands-on provider will typically be cheaper.
If a solution requires clustering, that typically rules out any provider without cross-box private networking, most notably Heroku. However, this does not rule out providers without that functionality in all cases and not every service requires clustering to scale.
Hot Code Reloading
Hot code reloading is the ability to update a running release to a different version in place with no downtime. This is incredibly powerful, and it should rarely be used. More specifically: it should only be used if it is imperative to do so. It requires extensive, meticulous planning; if the shape of state anywhere within the system has changed, it must be pushed through a function that carefully transforms it from old to new. Testing the upgrade should be given as much time and care as testing the system itself.
Furthermore, it runs counter to trending best practices with respect to local development and deploys, namely, containerization. It is recommended that releases should at least be built within a container that mirrors production. Beyond that, with the advent of services like AWS ECS and Kubernetes, it is common to run services from within containers in production. Two of the main benefits of this are the concept of immutable releases and a development environment that closely resembles production. Pushing a new version out via hot code reloading is possible in the world of containers, but it is not recommended as it cancels out the idea of immutability and adds complexity with respect to knowing what is deployed where and ensuring unexpected restarts bring up the correct version of a service.
With its innate complexity, and the friction it introduces with respect to containers, it is advisable to steer clear of hot code reloading if possible. In most use cases this is easy, especially when utilizing concepts like rolling upgrades or blue/green deploys to reach comparable levels of availability before, during, and after deploying a new version.
If a service is exceptional in that it is imperative that it has zero downtime, holds state in application memory, and it would be untenable to drain nodes, then hot code reloading may be the right tool for the job. Typically this lands teams on AWS EC2 or other big cloud providers' variations thereof. Basically what is required is:
- No containers
- A box that a team has effectively full control over
- A large, tried-and-true provider with impressive uptime guarantees and enough surrounding features to future proof the decision
With those boxes checked hot code reloading can facilitate impressive uptimes and quick upgrades since node draining is not required. Just be prepared for a much more hands-on DevOps experience and more preparation for any given release.
When picking providers consider the following:
- Size: battle-tested, quality guarantees
- Features: uptime, tangential services, and cross-box private networking
- Complexity: how much DevOps responsibility
- Cost: price versus value
Evaluate those within the context of Elixir with a focus on:
- State in Memory
- Hot Code Reloading
Stateless APIs in Elixir with Phoenix are typically straight-forward and most providers will suffice, given traffic expectations and cost of scaling are accounted for. If a service will be stateful or begins to hit some nebulous level of potential complexity, then a longer look should be taken at requirements.
If a team is spinning up more than a couple of services, there are also gains to be had by reducing thrashing between different hosts for each service. Meaning a host that can handle the simple as well as the complex will be desirable in the long run. No two services, nor two sets of requirements are the same, but a thorough planning phase can be repeatable and improve efficiency across projects.