Streams processing have been around for a while and encompasses a great number of applications:
HTTP servers handling stream of incoming HTTP requests
Message streams: Twitter hose, user posts, …
Time-series messaging: stream from IoT sensors
Database querying: result set contains a stream of record
Most interestingly reactive streams have gain traction over the past few years. They bring back-pressure into the game in order to avoid having the destination stream over flooded by messages from the source stream.
This post focuses on AkkaStream, a reactive stream implementation based on Akka actors. Unlike actors which are untyped, AkkaStreams provides type safety at every stage of the stream pipeline and also comes with a nice and fluent API. However the documentation is sometimes lacking or not easy to search when someone needs to implement common patterns. This post tries to cover the most common ones in a clear and concise way. Continue reading “Akka Streams patterns”
In this post we’re going to explore how to build a DSL (Domain Specific Language) with a user-friendly syntax while maintaining as much type-safety as possible. We want that any operations that is not allowed by the business rules fail at compile time. This would be really nice as it makes sure that no one writes such forbidden logic (even by mistake).
More over Scala provides really nice syntactic sugar that can make a DSL syntax pretty neat.
Custom-DSL are nice as they provide all the type-safety you need against your data schema. However in this post I will focus only on the Java driver. Why? Because it’s both a simple and decent solution in my opinion.
The bad thing is that you lose any type-safety as all the queries are just plain strings. On the other hand you don’t have to learn a new DSL because your queries are just CQL. Add a thorough test coverage and you have a viable solution.
Moreover the Java driver provides an async API backed by Guava’s futures and it’s not that difficult to turn these futures into Scala futures – which makes a quite natural API in Scala.
We recently deployed in production a distributed system that uses Cassandra as its persistent storage.
Not long after we noticed that there were many warnings about tombstones in Cassandra logs.
WARN [SharedPool-Worker-2] 2017-01-20 16:14:45,153 ReadCommand.java:508 -
Read 5000 live rows and 4771 tombstone cells for query
SELECT * FROM warehouse.locations WHERE token(address) >= token(D3-DJ-21-B-02) LIMIT 5000
We found it quite surprising at first because we’ve only inserted data so far and didn’t expect to see that many tombstones in our database. After asking some people around no one seemed to have a clear explanation on what was going on in Cassandra.
The actor model allows us to write complex distributed applications by containing the mutable state inside an actor boundary. However with Akka this state is not persistent. If the actor dies and then restarts all its state is lost.
To address this problem Akka provides the Akka Persistence framework. Akka Persistence is an effective way to persist an actor state but it’s integration needs to be well thought as it can greatly impact your application design. It fits nicely with the actor model and distributed system design – but is quite different from what a “more classic” application looks like.
In this post I am going to gloss over the different components of Akka Persistence and see how they influence the design choices. I’ll also try to cover some of the common pitfalls to avoid when building a distributed application with Akka Persistence.
Now that we know about Markov chain, let’s focus on a slightly different process: the Markov Decision Process.
This process is quite similar to a Markov chain but adds more concept into it: Actions and Rewards. Having a reward means that it’s possible to learn which action yield the best rewards. This type of learning is also known as reinforcement learning.
Last time we talk about what is a Markov chain. However there is one big limitation:
A Markov chain implies that we can directly observe the state of the process. (e.g the number of people in the queue).
Many times we can only access an indirect representation or noisy measure of the state of the system. (e.g. we know the noisy GPS coordinates of a robot but we want to know it’s real position).
In this post we’re going to focus on the second point and see how to deal with HMMs. In fact HMM can be useful every time that we don’t have direct access to the system state. Let’s take some motivational examples first before we dig into the maths. Continue reading “Hidden Markov Model”
I’ve heard the term “Markov chain” a couple of times but never had a look at what it actually is. Probably because it sounded too impressive and complicated.
It turns out it’s not as bad as it sounds. In fact the whole idea is pretty intuitive and once you get a feeling of how things work it’s much easier to get your head around the mathematics.
Andrey Markov was a Russian mathematician who studied stochastic processes (a stochastic process is a random process) and specially systems that follow a suite of linked events. Markov found interesting results about discrete processes that form a chain. Continue reading “Markov chain”