Akka Streams patterns

Streams processing have been around for a while and encompasses a great number of applications:

  • HTTP servers handling stream of incoming HTTP requests
  • Message streams: Twitter hose, user posts, …
  • Time-series messaging: stream from IoT sensors
  • Database querying: result set contains a stream of record
  • ….

Most interestingly reactive streams have gain traction over the past few years. They bring back-pressure into the game in order to avoid having the destination stream over flooded by messages from the source stream.

This post focuses on AkkaStream, a reactive stream implementation based on Akka actors. Unlike actors which are untyped, AkkaStreams provides type safety at every stage of the stream pipeline and also comes with a nice and fluent API. However the documentation is sometimes lacking or not easy to search when someone needs to implement common patterns. This post tries to cover the most common ones in a clear and concise way. Continue reading “Akka Streams patterns”

Akka persistence

The actor model allows us to write complex distributed applications by containing the mutable state inside an actor boundary. However with Akka this state is not persistent. If the actor dies and then restarts all its state is lost.

To address this problem Akka provides the Akka Persistence framework. Akka Persistence is an effective way to persist an actor state but it’s integration needs to be well thought as it can greatly impact your application design. It fits nicely with the actor model and distributed system design – but is quite different from what a “more classic” application looks like.

In this post I am going to gloss over the different components of Akka Persistence and see how they influence the design choices. I’ll also try to cover some of the common pitfalls to avoid when building a distributed application with Akka Persistence.

Although Akka Persistence allows you to plug in various storage backends in this post I mainly discuss using the Cassandra backend. Continue reading “Akka persistence”

Kafka streams

Stream computing is one of the hot topic at the moment. It’s not just hype but actually a more generic abstraction that unifies the classical request/response processing with batch processing.

Stream paradigm

The request/response is a 1-1 scheme: 1 request gives 1 response. On the other hand the batch processing is an all-all scheme: all requests are processed at once and gives all response back.

Stream processing lies in between where some requests gives some responses. Depending on how you configure the stream processing you lie closer to one end than the other.
Continue reading “Kafka streams”

Introduction to Alluxio

Continuing my tour of the Spark ecosystem today’s focus will be on Alluxio, a distributed storage system that integrates nicely with many compute engines – including Spark.

What is Alluxio ?

The official definition of Alluxio is (or at least that’s how one of its author presents it):

Alluxio is an open source memory speed virtual distributed storage

Let’s see what each of these terms actually means:
Continue reading “Introduction to Alluxio”