Apache Cassandra 4.0 beta was released August 31, 2020, the first major update since 2017, and a groundbreaking achievement for the open-source database community. The update focuses on performance improvements and improved observability, with a goal of making Cassandra the most stable database anywhere for end-users who require high availability for mission-critical data.
“Customers will be able to use the release knowing it is production-ready on day one,” says Sankalp Kohli, Apache Cassandra Committer and PMC Member. “In addition to stability, Cassandra 4.0 includes important features like incremental repair, faster host replacements and virtual tables. Cassandra 4.0 also introduces Witness Replicas as an experimental feature, taking the first step in delivering significant cost savings once the feature is ready for production in the future.”
Features of the Update
The Cassandra 4.0 update reflects hundreds of real-world uses cases and tested schemas. It represents the efforts of unprecedented cross-industry collaboration, through software, hardware, and QA testing donations from organizations such as Instaclustr, iland, Amazon, and Datastax. The update includes:
- Incremental Repair — Incremental repair in Cassandra 4.0 has a preparation phase in which anticompaction is performed prior to Merkle trees computation, and candidate SSTables are marked as pending. Full repair no longer involves anticompaction and does not mark SSTables as repaired. According to The Last Pickle, “incremental repair finally received the fix it deserved to make it production-ready for all situations.”
- Faster Scaling Operations — Cassandra streams data between nodes during scaling operations. The Zero Copy Streaming feature in 4.0 results in 5x faster streaming, delivering a more elastic architecture in certain environments.
- Virtual Tables — A much-anticipated feature update, Cassandra 4.0 Virtual Tables expose system metrics or configuration settings through the CXL interface rather than the JMX. This flexibility ensures operators have the signals to keep their deployments healthy.
- Security and Observability — Cassandra 4.0 includes an audit logging feature, new controls that allow enterprises to create authorization for data access on a per data center basis, and selective system metrics and configuration settings exposure through Virtual Tables. These regulatory and compliance updates were made to ensure compliance with regulatory statutes such as SOX, PCI and GDPR.
- Witness Replicas — This feature is still experimental, and its inclusion in this update comes with the goal of testing that its implementation does not negatively impact non-transient use cases.
- Better Compression — Previous versions of Cassandra have produced weak output results under compression tests. Significantly improving compression efficiency eases unnecessary strain on disk space and improves read performance.
- Validation Tooling — This feature will create a smooth transition for users, ensuring that data and database transition without loss and that correct and incorrect data is marked according to provided conditions.
- Over 1,000 Bug Fixes — The Apache Cassandra Community deployed several testing and QA projects and methodologies with the goal of deploying the most stable release yet. Over the testing and QA period, the community identified and resolved over 1,000 bug fixes, improving performance and observability.
“Apache Cassandra’s contributors have worked hard to deliver Cassandra 4.0 as the project’s most stable release yet, ready for deployment to production-critical cloud services,” says Scott Andreas, Apache Cassandra Contributor. “The project’s investment in advanced validation tooling means that Cassandra users can expect a smooth upgrade. Cassandra 4.0 will also provide a stable foundation for development of future features and the database’s long-term evolution.”
Cassandra 4.0 beta is available for download here.