Understanding Cassandra tombstones

We recently deployed in production a distributed system that uses Cassandra as its persistent storage.

Not long after we noticed that there were many warnings about tombstones in Cassandra logs.

WARN  [SharedPool-Worker-2] 2017-01-20 16:14:45,153 ReadCommand.java:508 - 
Read 5000 live rows and 4771 tombstone cells for query 
SELECT * FROM warehouse.locations WHERE token(address) >= token(D3-DJ-21-B-02) LIMIT 5000 
(see tombstone_warn_threshold)

We found it quite surprising at first because we’ve only inserted data so far and didn’t expect to see that many tombstones in our database. After asking some people around no one seemed to have a clear explanation on what was going on in Cassandra.

In fact, the main misconception about tombstones is that people associate it with delete operations. While it’s true that tombstones are generated when data is deleted it is not the only case as we shall see. Continue reading “Understanding Cassandra tombstones”

Akka persistence

The actor model allows us to write complex distributed applications by containing the mutable state inside an actor boundary. However with Akka this state is not persistent. If the actor dies and then restarts all its state is lost.

To address this problem Akka provides the Akka Persistence framework. Akka Persistence is an effective way to persist an actor state but it’s integration needs to be well thought as it can greatly impact your application design. It fits nicely with the actor model and distributed system design – but is quite different from what a “more classic” application looks like.

In this post I am going to gloss over the different components of Akka Persistence and see how they influence the design choices. I’ll also try to cover some of the common pitfalls to avoid when building a distributed application with Akka Persistence.

Although Akka Persistence allows you to plug in various storage backends in this post I mainly discuss using the Cassandra backend. Continue reading “Akka persistence”

Markov Decision Process

Now that we know about Markov chain, let’s focus on a slightly different process: the Markov Decision Process.

This process is quite similar to a Markov chain but adds more concept into it: Actions and Rewards. Having a reward means that it’s possible to learn which action yield the best rewards. This type of learning is also known as reinforcement learning.

In this post we’re going to see what exactly is a Markov decision process and how to solve it in an optimal way. Continue reading “Markov Decision Process”

Hidden Markov Model

Last time we talk about what is a Markov chain. However there is one big limitation:

A Markov chain implies that we can directly observe the state of the process. (e.g the number of people in the queue).

Many times we can only access an indirect representation or noisy measure of the state of the system. (e.g. we know the noisy GPS coordinates of a robot but we want to know it’s real position).

Trellis diagram

In this post we’re going to focus on the second point and see how to deal with HMMs. In fact HMM can be useful every time that we don’t have direct access to the system state. Let’s take some motivational examples first before we dig into the maths. Continue reading “Hidden Markov Model”