As we’ve seen in this comparison with Apache Cassandra, DynamoDB may be a valuable choice for storing data. It’s fully maintained by AWS you just have to configure your instance and you can start using it straight away.
Being one of the Amazon web services DynamoDB offers a JSON/HTTP interface and Amazon provides a set of client for a variety of language. Unfortunately Scala is not one of them but there is a Java client available. Sadly it suffers from big performance issues due to poor design choices.
So what are the alternatives? This is what we’re going to see in this post: How can we get the most out of the official driver and are there better alternatives both in terms of performance or integration with the Scala ecosystem. Continue reading “Querying DynamoDB from Scala”
Of course it means that DynamoDB and Cassandra have a lot in common! (They have the same DNA). However both AWS DynamoDB and Apache Cassandra have evolved quite a lot since this paper was written back in 2007 and there are now some key differences to be aware of when choosing between the two.
Custom-DSL are nice as they provide all the type-safety you need against your data schema. However in this post I will focus only on the Java driver. Why? Because it’s both a simple and decent solution in my opinion.
The bad thing is that you lose any type-safety as all the queries are just plain strings. On the other hand you don’t have to learn a new DSL because your queries are just CQL. Add a thorough test coverage and you have a viable solution.
Moreover the Java driver provides an async API backed by Guava’s futures and it’s not that difficult to turn these futures into Scala futures – which makes a quite natural API in Scala.
We recently deployed in production a distributed system that uses Cassandra as its persistent storage.
Not long after we noticed that there were many warnings about tombstones in Cassandra logs.
WARN [SharedPool-Worker-2] 2017-01-20 16:14:45,153 ReadCommand.java:508 -
Read 5000 live rows and 4771 tombstone cells for query
SELECT * FROM warehouse.locations WHERE token(address) >= token(D3-DJ-21-B-02) LIMIT 5000
We found it quite surprising at first because we’ve only inserted data so far and didn’t expect to see that many tombstones in our database. After asking some people around no one seemed to have a clear explanation on what was going on in Cassandra.
We all know that Cassandra is a distributed database. However there’re situations where one needs to perform an atomic operation and for such cases a consensus must be reached between all the replicas.
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behaviour or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.