Anshul Mishra

Thoughts on Apache Druid

At Myntra, I got to work on this hot new piece of tech called Druid. It is an OLAP database system. Its main selling point is that it's fast and that it can handle large volumes of data, at the same time. It was put together at Metamarkets and later open-sourced. It combines the fast response of a relational database with the ability to process large datasets like a spark cluster.

Let's answer a few questions about it?

Is it OLAP or OLTP?

In the traditional sense, Druid is an OLAP. OLAP systems help analysts fire queries on large datasets to gain insights. Whereas OLTP systems cater to an application's functionality. A database system handling the queries for managing your salary account would traditionally be called an OLTP. On the other hand, the database system that your bank's analyst would use to figure out what investment scheme to sell you based on your transaction history would be an OLAP.

What makes it fast?

Druid has data nodes and has its unique format to store data into what are called segments. Segments are immutable. Segments are balanced across data nodes, called historicals. Druid caches data on the memory of historicals for a super fast response time. A query is spread across multiple data nodes depending on how the data is segmented across multiple historicals.