The Big Three
Elixir provides powerful built-in tools to solve problems that may otherwise have required outside dependencies like Redis or Kubernetes. However, it is important to remember that Kool-aid can be responsibly enjoyed without gorging on the powdered mix. There are three areas in particular that require care before brazen indulgence:
- State in Memory
- Hot Code Reloading
These are powerful concepts, but they come with costs and tradeoffs that should not be taken lightly.
State in Memory
When managing state in a service, it is not uncommon to reach for something like Redis or Postgres. With Elixir, dependencies like these can sometimes be avoided by making use of the language's provided tools:
This allows a team to build with agility, only taking on dependencies when required. However, when state is kept in application memory a tradeoff is made: latency to interact with data is improved, but if the process holding that state unexpectedly goes away, so does the state. If the host is something like AWS EC2 this will be rare and typically driven by how often the service encounters crash-inducing bugs. If Heroku is the host, dyno restarts will ensure it happens at least daily.
Importantly, this does not rule out providers like Heroku as an option. In either case, strong uptime guarantees or not, it is wise to at least plan for important state-holding processes to die because, someday, they will. This truth forces important architectural decisions to be made, decisions that should be made as early as possible (read: before significant development has occurred). Minimally, the following questions should be asked about the state the service will manage.
Is the data relatively:
- Cheap and fast to rebuild
Depending on the particular use case and the answer to those questions, it may be possible to get away with only what Elixir provides; allowing state to be discarded or managed exceptionally when processes terminate and rebuilding to a consistent state when they come back up. On the other hand, if important state is stored in memory that is expensive or slow to rebuild and the parent process goes down: clients of that service are going to have a bad time. If the issue is one of latency rather than avoiding a dependency, Redis is a good choice here and can either be used on its own or as a read-through cache in front of Postgres.
While it is true that Elixir facilitates scalable, distributed services, it is also true that "there ain't no such thing as a free lunch." Elixir on a box will scale vertically until a core is fully utilized and horizontally across cores, but then what? There is a big difference between just scaling horizontally across multiple boxes and actually clustering the nodes that live on those boxes.
When scaling across multiple boxes without clustering there are a couple of notable concerns. First, how does the service need to scale? Is manual intervention being required ok? Is automation necessary? Second, how quickly can I scale up? Finally, how costly is it to scale? Typically providers that manage more for you and make development and operations simpler are going to be more expensive initially and in turn when scaling. Whereas if the time and knowledge investment can be justified, or has already occurred, a more hands-on provider will typically be cheaper.
If a solution requires clustering, that typically rules out any provider without cross-box private networking, most notably Heroku. However, this does not rule out providers without that functionality in all cases and not every service requires clustering to scale.
Hot Code Reloading
Hot code reloading is the ability to update a running release to a different version in place with no downtime. This is incredibly powerful, and it should rarely be used. More specifically: it should only be used if it is imperative to do so. It requires extensive, meticulous planning; if the shape of state anywhere within the system has changed, it must be pushed through a function that carefully transforms it from old to new. Testing the upgrade should be given as much time and care as testing the system itself.
Furthermore, it runs counter to trending best practices with respect to local development and deploys, namely, containerization. It is recommended that releases should at least be built within a container that mirrors production. Beyond that, with the advent of services like AWS ECS and Kubernetes, it is common to run services from within containers in production. Two of the main benefits of this are the concept of immutable releases and a development environment that closely resembles production. Pushing a new version out via hot code reloading is possible in the world of containers, but it is not recommended as it cancels out the idea of immutability and adds complexity with respect to knowing what is deployed where and ensuring unexpected restarts bring up the correct version of a service.
With its innate complexity, and the friction it introduces with respect to containers, it is advisable to steer clear of hot code reloading if possible. In most use cases this is easy, especially when utilizing concepts like rolling upgrades or blue/green deploys to reach comparable levels of availability before, during, and after deploying a new version.
If a service is exceptional in that it is imperative that it has zero downtime, holds state in application memory, and it would be untenable to drain nodes, then hot code reloading may be the right tool for the job. Typically this lands teams on AWS EC2 or other big cloud providers' variations thereof. Basically what is required is:
- No containers
- A box that a team has effectively full control over
- A large, tried-and-true provider with impressive uptime guarantees and enough surrounding features to future proof the decision
With those boxes checked hot code reloading can facilitate impressive uptimes and quick upgrades since node draining is not required. Just be prepared for a much more hands-on DevOps experience and more preparation for any given release.
When picking providers consider the following:
- Size: battle-tested, quality guarantees
- Features: uptime, tangential services, and cross-box private networking
- Complexity: how much DevOps responsibility
- Cost: price versus value
Evaluate those within the context of Elixir with a focus on:
- State in Memory
- Hot Code Reloading
Stateless APIs in Elixir with Phoenix are typically straight-forward and most providers will suffice, given traffic expectations and cost of scaling are accounted for. If a service will be stateful or begins to hit some nebulous level of potential complexity, then a longer look should be taken at requirements. If a team is spinning up more than a couple services, there are also gains to be had by reducing thrashing between different hosts for each service. Meaning a host that can handle the simple as well as the complex will be desirable in the long run. No two services, nor two sets of requirements are the same, but a thorough planning phase can be repeatable and improve efficiency across projects.