What ISVs Can Learn from Apps Built on Really Big, Fast-Moving Data

Here's why software developers are turning to Apache Druid to enable new analytics use cases on really big and fast-moving data.


The software developers at Netflix, Twitter, Confluent and Salesforce are doing something really interesting. They’re knocking down the proverbial firewall between conventional BI analytics and transactional applications — bridging these two paradigms to enable new analytics use cases on really big and fast-moving data. They’re building modern analytics applications — apps that play a key role in delivering interactive insights to countless users on trillions of rows of streaming and batch data.

“Analytics say hello to Applications” — but why?  And why are software developers at thousands of companies turning to Apache Druid to power these applications when there are literally over 300 databases out there in the world? To understand why, let’s zero in on the four kinds of apps being built with Apache Druid:

1 Operational Visibility at Scale

Developers are building applications to improve operational visibility from the massive amount of clicks, logs and metrics coming out of cloud services and telemetry — some examples of which include digital operations, product analytics, observability, fraud detection and IoT. These developers are essentially charged with figuring out how to present insights from real-time data and seal the gap between the creation of an event and the time to achieve insight.

While multiple data systems including Apache Flink, InfluxDB, and Redis have the ability to analyze events in real-time, these technologies aren’t designed to perform online analytical processing (OLAP)-style queries on high dimensional data at scale.

Developers instead often turn to Apache Druid, just like those at Salesforce who built an analytics app using Druid to monitor their cloud product experience. The app allows users to query dimensions, filters, and aggregations in any combination of real-time logs to analyze performance, trends, and troubleshooting, receiving results to their queries seconds later, ingesting billions to trillions of streaming events daily.

2 Externally-facing Analytics

What’s something that Atlassian, Forescout and Twitter share in common? Each leverages analytics to go beyond only internal decision-making. They all provide their customers with insights.

So the second core use of Apache Druid is for powering applications that serve external users, as these must address a different set of technical challenges than internal analytics, with concurrency first on the list. The database must support user growth, but it must also be economically feasible to avoid a sky-high infrastructure cost.

Another point to note for an interactive experience is sub-second query response. Since most external users are paying customers, they won’t be happy waiting for queries to process, so Druid becomes a convenient go-to.

3 Fast Drill-down Exploration

It isn’t enough for businesses to rely on reports, which do what they promise: report information, whether it’s on which product is the top seller or customer demographics. For analytics to maximize the value of their data, companies need a way to see why something happened. This can help organizations solve their current problems or be able to anticipate something recurring in the future.

So the third use is, to get to the “why,” developers need a way to slice and dice data to instantly determine root causes. This isn’t too tough with a small data set that any modern data warehouse can handle, but it gets much more challenging when the data you’re analyzing has trillions of rows and is constantly flowing in.

With really big data like this, it takes too long to use full table scans, and there are, unfortunately, pricey trade-offs when it comes to the usual query-shaping techniques that developers generally use to expedite performance. Without the right tool, locating the needle in the haystack can be nearly impossible. Fortunately, Apache Druid simplifies investigation and root cause diagnostics, enabling a truly interactive data experience — regardless of the amount of data at stake.

4 Real-time Inference

Fourth but not least, analytics generally brings to mind a synthesized data set depicted via a chart or dashboard; this allows someone to view it and decide whether or not to take action. But in some cases, a user only needs to know the data’s meaning, or the decision needs to be made more quickly than people can think.

If the app requires inference in real-time on lots of data, then Druid is the best solution. The speed of a Druid query result feeds into rules engines and machine learning frameworks for automated decisioning. In short, when the optimal answer requires instant query response of high cardinality, highly dimensional event data at scale, Druid can help developers powering inference in apps—whether for recommendations, diagnostics, or automated decisions.


As Vice President of Product Marketing of Imply, David Wang is responsible for the company’s positioning, product messaging, and technical content across Apache Druid and Imply-branded products. Previously, David served in leadership roles at Hewlett Packard Enterprise (HPE), Nimble Storage, and GE Digital. Wang repositioned HPE Storage for cloud data services; drove category creation for predictive flash storage at Nimble; and transformed GE’s municipal lighting business to smart city, analytics applications, leading to its eventual spin-off and acquisition. Wang holds a B.S. in electrical engineering from the Georgia Institute of Technology and an M.B.A from the University of Southern California.