Why 2023 Will Be All About Streaming Data

Streaming data has moved from niche to normal. Here's what's fueling this shift, positioning 2023 as a huge year for streaming technology.


Streaming data has moved from niche to normal. All top cloud providers have launched a streaming service, and over 80 percent of Fortune 100 companies use the most common streaming platform, Apache Kafka.

What’s fueling this shift toward ubiquity in the world of data engineering, positioning 2023 as a huge year for streaming technology? It’s simple: there’s an ever-increasing demand to reliably and quickly deliver events to support applications—and to do so at scale. As more and more use cases need subsecond updates to internal and external apps alike, it becomes imperative for companies to have access to data processing and dissemination.

Businesses are also using data differently today, gaining real-time insights by analyzing events in the moment they’re created for instant comparison between past and present, leading to better decision-making. This “next evolution” of data intelligence is based on the ability to react to events right as they occur.

Streaming-Native Analytics

As data streams continue to multiply in 2023, this reality has ushered in new use cases and requirements for analytics in real-time. This is why it’s now critical for data teams to embrace a different approach to analytics architecture—one that’s streaming-native—as they move beyond the traditional batch-oriented stack.

What does this evolution in streaming look like? Fast-moving data in motion has replaced static “data at rest” due to the 24/7 world of the Internet usurping daily batch mainframe operations. The result is that data now moves seamlessly within and between organizations via data systems and apps. Making this paradigm shift—and mindset shift—from fixed to flowing data will help organizations unlock the power of streaming.

Streaming data platforms have become as vital to companies as the central nervous system is to our bodies since both connect various functions while managing essential operations. Today you can find evolving technologies, like the real-time analytics database Apache Druid, that are “purpose-built” in their ability to support systems of data in motion: event databases and stream processors. In Druid’s case, users can query each event that enters the data stream. This takes place at a massive scale while also allowing subsecond queries on a combination of batch and stream data.

It’s just the start of a new period where streaming rapidly emerges as the centerpiece of data architecture. But how exactly can you enable subsecond analytics on streaming data and keep it scalable? 

Overcoming “Data in Waiting”

As Kafka users focus on ramping up the next phase of data streaming, a problem has come to light: “data in motion” morphs into “data in waiting” when it’s time to analyze or put those streams to work in a user-facing application. Why? Because of the reliance on an analytics system that was designed not for streaming but for batch data.

The only way around this is to use a new type of database, hence the creation of Druid. Stream processors such as Kafka, in conjunction with Druid, help revamp trillions of events into streams that large volumes of users can simultaneously query. This combination has enabled fresh possibilities that have led to new analytics applications developers have built.

Limitless Scaling

Scalability is another key reason to use Druid for streaming data analytics since its capabilities in this regard are nearly infinite, scaling tens of millions events every second. Druid also dodges resource lag and overprovisioning via dynamic scaling, Data teams can easily add computing power and scale out with zero downtime.

It doesn’t make sense to use a cloud data warehouse to serve batch-oriented and real-time use cases at once since that’s counter to the goals of a streaming pipeline. In this case, batch processing will slow the whole pipeline down, decreasing the value of speedy real-time. But by leveraging Kafka and Druid hand in hand, the combined platform can ingest millions of events per second through Kafka while juggling hundreds of concurrent analyst queries.

A Bigger Future

The streaming-native approach to analytics may sound very forward-thinking, but it’s already happening. Many organizations are ahead of the curve, recognizing that streaming data is a force multiplier when it comes to moving, analyzing, and sharing data. New use cases and products continue to come to fruition via streaming technology. Companies that embrace this reality will gain a competitive advantage this year and for the foreseeable future.

Druid is not only purpose-built for emerging streaming use cases, but it’s also perfect for an application that relies on the analytics and transactional database worlds alike. Pairing streaming data with Druid, data teams have developed mission-critical apps that make four invaluable things possible: customer-facing analytics, operational visibility, real-time decision-making, and drill-down exploration. This isn’t the end of this trajectory but only the beginning—as this current trend takes root, developers are already working to build a new wave of analytics applications to shape the future.


Julia Brouillette is a Senior Technologist at Imply (www.imply.io), a high performance real-time analytics database company. Before joining Imply, Julia held various technical and developer-centric roles at Twilio and Aurora Insight, where she helped deliver analytics and Data-as-a-Service products for the next generation of digital communications.