Writing real-time applications is hard; harder still, if it needs to be distributed and fault tolerant. Minimizing latency and optimizing throughput are major goals for such applications. They demand quick response times and positive user experience even in the event of a failure of an external system or a spike in traffic.
The legacy approach is to rely on external services, such as databases, queues, etc., to handle concurrent or asynchronous operations. However, this is not viable in many scenarios. Many real-time systems—such as trading or banking applications—cannot afford the long waiting times it takes to handle concurrent requests.
Let's consider Node.js, which is event-driven. It helps performance by avoiding context switches and blocking. However, it does not take advantage of multiple processors concurrently, which requires spawning and coordinating between multiple independent processes. This causes additional overhead and can become a pain point later on.
The Akka Advantage
In the Akka framework, we have a contender that promises to absolve us of these nuances that plagued developers in the past. Akka is an event-driven toolkit for building reliable and distributed applications in Java. All the complexities of thread creation, message handling, race conditions, and synchronization are handled transparently by Akka. Along with guaranteeing real-time delivery, the Akka system enables us to balance the load as well as scale up or down. A message-driven architecture provides the overall foundation for Akka-based systems. If any new functionality is introduced, the system adapts accordingly, saving us the hassle of reengineering the whole system.
Akka, being built on a message-driven platform, lets us build applications with minimal latency compared to legacy systems. Its unique blend of functional programming with actors lets us write code that is easy to comprehend and test as well as handle failures better compared to traditional JVM systems. The "let it crash" philosophy of Akka helps us to build applications that heal on their own and never stop running.
This blog examines the Akka architecture and explores the ways in which it facilitates the development of distributed, real-time, and fault-tolerant trading application.
The requirement was to provide stock market trading data to our clients with minimum delay. Even a millisecond delay could adversely affect the day trading performance of our clients. So we set out to build a system that has no downtime and is elastic and reactive.
What We Built
The system we built contains multiple applications coalescing as a single unit and reacting to its surroundings while remaining aware of each other’s existence. They interact with each other internally to carry out complex tasks.
We chose reactive programming for developing critical components (based on message-passing) that allowed them to be decoupled and developed separately. Decoupling was required to isolate components and meet our requirements of resilience and elasticity.
Akka Actors to the Rescue!
The application logic resides in lightweight objects called actors, which send and receive messages. They have a very low memory footprint and do not have a direct mapping with threads on a VM. In a standard application of Akka, it is possible to create millions of actors in a single VM, many of which will not be doing anything unless a message is sent to them. 1 GB memory can fit in around 2.7 million actors whereas only 4096 threads can be created using a legacy approach on a single VM.
Everything done by the actors is asynchronous so they can send messages without waiting for a reply. We can easily configure an actor's life cycle and failure handling using the different supervision strategies provided by Akka.
The trading application we developed has two main workflows:
The first workflow is to deliver up-to-date trade values to each user. The user will be shown different values based on some margins that are preconfigured. We designed a dedicated actor to perform the margin calculations on the trade signals received from the server, which are then delivered to the user through WebSockets.
The other workflow required no dedicated actor for each user. Instead, a new actor had to be created for each new stock market trade done by the user (and all trades will have their own associated actor within the system). This actor takes care of all the business logic associated with that trade.
It sends a request to the stock market, initializing a new trade. It then waits, listening to the trade signals from the gateway and checks if the price from the market has crossed the sell/buy price set by the user. When this threshold is crossed, the actor places the trade. After getting confirmation of the trade from the gateway, it does the necessary bookkeeping operations and informs the trader. Message delivery is guaranteed by the self-healing nature of Akka.
The actors we created come under another parent actor. Then you might ask, who is the parent of the first actor in the system? The first actor will get automatically created by the Akka as a root node of the actor system.
- Real-time trade rate updates
- Order processing or handling the trade done by the user
Akka actors in the Rate and Order workflows
The two-actor system we developed consists of a master actor that resides in the trading client module and a cluster manager actor for each instance in the cluster. Then there are the worker actors who do the actual business logic and are spawned by cluster managers.
This structure of loosely coupled Akka actors with adaptive routing gave us a lot of flexibility in scaling the cluster up or down. Our application needed to scale up when users logged into the system and not slow down if the number of users increased. Also, the trades placed by the users had to be processed without any delay. An Akka-based solution was perfect for this job. When the users log out, we can proceed to scale down by killing the actors and freeing up the resources.
Handling Crashes Using Akka
The way Akka handles exceptions is different from the legacy approach. The Akka actor life-cycle model allows actors to be suspended and restarted in case of faults. Akka handles fault recovery in a separate recovery flow, wherein actors called supervisors monitor the other actors. The supervisors can decide to restart the child actors on certain types of failures or stop them completely on others. Restarts are transparent to the rest of the system. The collaborating actors can continue sending other messages while the target actor restarts itself and handles the failed messages.
Message Delivery to the End User
We chose Eclipse Vert.x web-socket servers for delivering the processed trading signals to the end users as it is non-blocking and event-driven. This means that the server can handle a lot of concurrencies using a small number of kernel threads. What this means is top-notch performance with minimal hardware.
Since nothing blocks, an event loop can potentially deliver huge amounts of events in a short amount of time. For example, a single event loop can handle many thousands of requests very quickly. By default, Vert.x chooses the number of event loops based on the number of CPU cores. This means a single Vert.x process can scale across your server, unlike Node.js.
Using the Akka Framework, we were able to achieve less than 10ms latency in our tests in the local network for the delivery of trading signals to the end user. The signals came at a rate of 600 trade ticks per minute, and the system could handle them with ease thanks to the elegant thread handling solution offered by the Akka framework.
Akka definitely has a learning curve, but the ease with which new functionalities could be incorporated into the system was phenomenal. Akka will definitely be on our minds when there are future requirements for distributed real-time applications.