Protocol Buffer (aka Protobuf) is an efficient and fast way to serialise data into a binary format. It is much more compact than Java serialisation or any text-based format (Json, XML, CSV, …).
Protobuf is schema based – it needs a description (in a .proto file) of the data structures to be serialised/deserialised.
On the JVM, protoc (the Protobuf compiler) reads the .proto description files and generates corresponding classes.
For Scala there is a very good sbt plugin “scalaPB” that follows the same process and generates case classes corresponding to the .proto files definitions.
The .proto files are an easy way to describe a protocol between 2 components (e.g. services). However there are some cases (e.g. writing to persistent storage) where the .proto files definition are just unnecessary and add superfluous complexity. (Who likes to read auto-generated code?).
In such cases it would be much easier to serialise an object directly into protobuf (using its class definition as a schema). Afterall this is what the protobuf java binding does: it serialises (auto-generated) java classes into protobuf binary format.
To that matter, let me introduce – PBDirect – a scala library to directly encode scala objects into protobuf. Continue reading “PBDirect – Protobuf without the .proto files”
It’s been a while we haven’t covered any machine learning algorithm. Last time we discussed the Markov Decision Process (or MDP).
Today we’re going to build our knowledge on top of the MDP and see how we can generalise our MDP to solve more complex problems.
Reinforcement learning really hit the news back in 2013 when a computer learned how to play a bunch of old Atari games (like Breakout) just by observing the pixels on the screen. Let’s find out how this is possible! Continue reading “Reinforcement learning”
Logging has been around on the JVM for a while now. It all started with Log4J back in 2001. Log4J was the first logging framework and it is still around today (in its version 2). It provides a simple and efficient API (compare to
System.out.println that was in use before).
- Get a logger for a class
- Use that logger to log messages
val logger = Logger.getLogger(classOf[MyClass])
logger.log(Level.DEBUG, "I am doing something right now")
logger.error("Oops, something went wrong", theException)
Today there are a few more frameworks on the JVM but they all provide similar APIs as Log4J:
- JUL(2002): java.util.logging provides a standardisation of Log4J and of course provides a similar API
- Commons-logging (2002): Apache project providing a façade over Log4J, JUL, … still the same API
- SLF4J (2005): Another façade over Log4J (1&2), JUL, JCL, … no much changes in the API
- Logback (2006): Brings structured logging with an API compatible (and similar) to SLF4J (and Log4J)
- Log4J2 (2012): Rewrite of Log4J inspired by Log4J and Logback with improved performances. The API does not change much though.
As you can see the logging APIs available on the JVM haven’t changed much over the last 15 years. The most interesting additions are structured logging and the Mapped Dependent Context (MDC) as we shall see later.
In this post I am going to look at the current limitations of these APIs and see how we can overcome them while still relying on this frameworks to actually write the logs. Continue reading “Rethinking logging on the JVM with Logoon”
As promised in my previous post we’re going to explore to internal of Fluent and how it uses Shapeless and implicit resolution to transform case classes.
Fluent started as an experiment (and still is), the code is rather small (about 300 lines of code) and yet I am still impressed by the variety of cases it can handle.
Before working with Shapeless I’ve often heard that is pure magic and I got the impression that most people (including me) don’t really know how it works. It turns out that the principles used in Shapeless are not really difficult to understand – especially if you read the well-written Type Astronaut’s guide to Shapeless.
Understanding how Shapeless works doesn’t mean it’s easy to work with. Actually Shapeless makes a heavy use of implicits and working with implicits is hard. Remember that implicits resolution is performed at compile time so when it fails, there is nothing to debug, no log messages or stack trace. We are just left with rather blunt messages like
could not find implicit value for parameter ...
In this post I am going to explain the concept used in Fluent, the problem I faced during implementation and hopefully by the end of the post, you’ll know enough to understand and edit the code (Pull requests welcomed!). Continue reading “Fluent – A deep dive into Shapeless and implicit resolution”
In Domain Driven Design (DDD) it is recommended to introduce a translation layer (aka anticorruption layer) between 2 bounded contexts. The role of the anticorruption layer is to avoid any concepts to leak from one domain into the other.
This is a sound idea as it keeps the domains isolated from each other ensuring they can evolve independently. After having implemented several anticorruption layers I realised that, although useful, they also introduced a lot of boilerplate code that doesn’t add much value to the business.
To this extent, let me introduce Fluent, a library that aims at getting rid of this boilerplate code by leveraging all the power of Shapeless and its generic programming. Continue reading “Introducing Fluent – the seamless translation layer”
Streams processing have been around for a while and encompasses a great number of applications:
- HTTP servers handling stream of incoming HTTP requests
- Message streams: Twitter hose, user posts, …
- Time-series messaging: stream from IoT sensors
- Database querying: result set contains a stream of record
Most interestingly reactive streams have gain traction over the past few years. They bring back-pressure into the game in order to avoid having the destination stream over flooded by messages from the source stream.
This post focuses on AkkaStream, a reactive stream implementation based on Akka actors. Unlike actors which are untyped, AkkaStreams provides type safety at every stage of the stream pipeline and also comes with a nice and fluent API. However the documentation is sometimes lacking or not easy to search when someone needs to implement common patterns. This post tries to cover the most common ones in a clear and concise way. Continue reading “Akka Streams patterns”
Cassandra drivers are not just a dumb piece of software that sends CQL strings to a Cassandra node and waits for responses.
They are actually quite smart and are architectured in a way that should make your life easier while still attempting to get the most performance out of Cassandra.
In this post I am going to focus on the Java driver, have a quick look at its architecture and on some of the features it offers. Continue reading “The Cassandra Java Driver”
If you followed our previous post on forging a DSL using type classes, you surely notice that writing type class instances is a rather repetitive task.
In today’s post we’re going to get rid of this by derivating type class instances automatically using Shapeless. We also use this post as an excuse to experiment with Shapeless and try to understand all the “magic” that’s happening. Continue reading “Reducing type class boilerplate with Shapeless”
In this post we’re going to explore how to build a DSL (Domain Specific Language) with a user-friendly syntax while maintaining as much type-safety as possible. We want that any operations that is not allowed by the business rules fail at compile time. This would be really nice as it makes sure that no one writes such forbidden logic (even by mistake).
More over Scala provides really nice syntactic sugar that can make a DSL syntax pretty neat.
If you don’t know what type classes or don’t feel very comfortable with this concept, follow along as we’ll also explore how we can use them to dissociate data and behaviours (always a good practice). Continue reading “Forging a DSL using Scala type classes”
When it comes to accessing Cassandra from Scala there are 2 possible approaches:
Custom-DSL are nice as they provide all the type-safety you need against your data schema. However in this post I will focus only on the Java driver. Why? Because it’s both a simple and decent solution in my opinion.
The bad thing is that you lose any type-safety as all the queries are just plain strings. On the other hand you don’t have to learn a new DSL because your queries are just CQL. Add a thorough test coverage and you have a viable solution.
Moreover the Java driver provides an async API backed by Guava’s futures and it’s not that difficult to turn these futures into Scala futures – which makes a quite natural API in Scala.
There are still some shortcomings that you’d better be aware of when consuming a result set but overall I think that it’s still a simple solution that is worth considering. Continue reading “Querying Cassandra from Scala”