The Apache Cassandra Project recently released v4.0 of Apache Cassandra, the open source, highly performant, distributed big data database management platform.
“A long time coming, Cassandra 4.0 is the most thoroughly tested Cassandra yet,” said Nate McCall, vice president of Apache Cassandra. “The latest version is faster, more scalable, and bolstered with enterprise security features, ready-for-production with unprecedented scale in the cloud.”
As a NoSQL database, Apache Cassandra handles massive amounts of data across load-intensive applications with high availability and no single point of failure. Cassandra’s largest production deployments include Apple (more than 160,000 instances storing over 100 petabytes of data across 1,000-plus clusters), Huawei (more than 30,000 instances across 300-plus clusters), and Netflix (more than 10,000 instances storing 6 petabytes across 100-plus clusters, with over 1 trillion requests per day), among many others. Cassandra originated at Facebook in 2008, entered the Apache Incubator in January 2009, and graduated as an Apache Top-Level Project in February 2010.
Apache Cassandra v4.0
Cassandra v4.0 effortlessly handles unstructured data, with thousands of writes per second. Three years in the making, v4.0 reflects more than 1,000 bug fixes, improvements, and new features that include:
- Increased speed and scalability – streams data up to 5 times faster during scaling operations, and up to 25% faster throughput on reads and writes, that delivers a more elastic architecture, particularly in Cloud and Kubernetes deployments.
- Improved consistency—keeps data replicas in sync to optimize incremental repair for faster, more efficient operation and consistency across data replicas.
- Enhanced security and observability—audit logging tracks users’ access and activity with minimal impact to workload performance. New capture and replay enables analysis of production workloads to help ensure regulatory and security compliance with SOX, PCI, GDPR, or other requirements.
- New configuration settings—exposed system metrics and configuration settings provide flexibility for operators to ensure they have easy access to data that optimize deployments.
- Minimized latency—garbage collector pause times are reduced to a few milliseconds with no latency degradation as heap sizes increase.
- Better compression—improved compression efficiency eases unnecessary strain on disk space and improves read performance.
Cassandra 4.0 is community-hardened and tested by Amazon, Apple, DataStax, Instaclustr, iland, Netflix, and others that routinely run clusters as large as 1,000 nodes and with hundreds of real-world use cases and schemas.
The Apache Cassandra community deployed several testing and quality assurance (QA) projects and methodologies to deploy the most stable release yet. During the testing and QA period, the community-generated reproducible workloads that are as close to real-life as possible, while effectively verifying the cluster state against the model without pausing the workload itself.
“In our experience, nothing beats Apache Cassandra for write scaling, and we’re looking forward to the performance and management improvements in the 4.0 release,” said Elliott Sims, senior systems administrator at Backblaze. “We rely on Cassandra to manage over one exabyte of customer data and serve over 50 billion files for our customers across 175 countries so optimizing Cassandra’s capabilities and performance means a lot to us.”
“Since 2016, software engineers at Bloomberg have turned to Apache Cassandra because it’s easy to use, easy to scale, and always available,” said Isaac Reath, Software Engineering Team Lead, NoSQL Infrastructure at Bloomberg. “Today, Cassandra is used to support a variety of our applications, from low-latency storage of intraday financial market data to high-throughput storage for fixed income index publication. We serve up more than 20 billion requests per day on a nearly 1 PB dataset across a fleet of 1,700+ Cassandra nodes.”
“Netflix uses Apache Cassandra heavily to satisfy its ever-growing persistence needs on its mission to entertain the world. We have been experimenting and partially using the 4.0 beta in our environments and its features like Audit Logging and backpressure,” said Vinay Chella, Netflix Engineering Manager and Apache Cassandra Committer. “Apache Cassandra 4.0’s improved performance helps us reduce infrastructure costs. 4.0’s stability and correctness allow us to focus on building higher-level abstractions on top of data store compositions, which results in increased developer velocity and optimized data store access patterns. Apache Cassandra 4.0 is faster, secure, and enterprise-ready; I highly suggest giving it a try in your environments today.”
“Apache Cassandra’s contributors have worked hard to deliver Cassandra 4.0 as the project’s most stable release yet, ready for deployment to production-critical Cloud services,” said Scott Andreas, Apache Cassandra Contributor. “Cassandra 4.0 also brings new features, such as faster host replacements, active data integrity assertions, incremental repair, and better compression. The project’s investment in advanced validation tooling means that Cassandra users can expect a smooth upgrade. Once released, Cassandra 4.0 will also provide a stable foundation for the development of future features and the database’s long-term evolution.”
Apache Cassandra is in use at Activision, Apple, Backblaze, BazaarVoice, Best Buy, Bloomberg Engineering, CERN, Constant Contact, Comcast, DoorDash, eBay, Fidelity, GitHub, Hulu, ING, Instagram, Intuit, Macy’s, Macquarie Bank, Microsoft, McDonalds, Netflix, New York Times, Monzo, Outbrain, Pearson Education, Sky, Spotify, Target, Uber, Walmart, Yelp, and thousands of other companies that have large, active data sets. In fact, Cassandra is used by 40% of the Fortune 100. Select Apache Cassandra case studies are available at https://cassandra.apache.org/case-studies/
In addition to Cassandra 4.0, the Project also announced a shift to a yearly release cycle, with releases to be supported for a three-year term.
Catch Apache Cassandra in action through presentations from the April 2021 Cassandra World Party https://s.apache.org/jjv2d .
Availability and Oversight
Apache Cassandra software is released under the Apache License v2.0 and is overseen by a volunteer, self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Cassandra, visit https://cassandra.apache.org/ and https://twitter.com/cassandra.
About Apache Cassandra
Apache Cassandra is an Open Source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Apache Cassandra is used in some of the largest data management deployments in the world, including nearly half of the Fortune 100.