We recently deployed in production a distributed system that uses Cassandra as its persistent storage.
Not long after we noticed that there were many warnings about tombstones in Cassandra logs.
WARN [SharedPool-Worker-2] 2017-01-20 16:14:45,153 ReadCommand.java:508 -
Read 5000 live rows and 4771 tombstone cells for query
SELECT * FROM warehouse.locations WHERE token(address) >= token(D3-DJ-21-B-02) LIMIT 5000
We found it quite surprising at first because we’ve only inserted data so far and didn’t expect to see that many tombstones in our database. After asking some people around no one seemed to have a clear explanation on what was going on in Cassandra.
The actor model allows us to write complex distributed applications by containing the mutable state inside an actor boundary. However with Akka this state is not persistent. If the actor dies and then restarts all its state is lost.
To address this problem Akka provides the Akka Persistence framework. Akka Persistence is an effective way to persist an actor state but it’s integration needs to be well thought as it can greatly impact your application design. It fits nicely with the actor model and distributed system design – but is quite different from what a “more classic” application looks like.
In this post I am going to gloss over the different components of Akka Persistence and see how they influence the design choices. I’ll also try to cover some of the common pitfalls to avoid when building a distributed application with Akka Persistence.
Now that we know about Markov chain, let’s focus on a slightly different process: the Markov Decision Process.
This process is quite similar to a Markov chain but adds more concept into it: Actions and Rewards. Having a reward means that it’s possible to learn which action yield the best rewards. This type of learning is also known as reinforcement learning.
Last time we talk about what is a Markov chain. However there is one big limitation:
A Markov chain implies that we can directly observe the state of the process. (e.g the number of people in the queue).
Many times we can only access an indirect representation or noisy measure of the state of the system. (e.g. we know the noisy GPS coordinates of a robot but we want to know it’s real position).
In this post we’re going to focus on the second point and see how to deal with HMMs. In fact HMM can be useful every time that we don’t have direct access to the system state. Let’s take some motivational examples first before we dig into the maths. Continue reading “Hidden Markov Model”