e51716ee724cf4950df67eba0393b3f798b7dc00 | Author: jberragan <jberragan@gmail.com>
| 2024-12-06 16:09:12-08:00
CASSANDRA-19962: CEP-44 Kafka integration for Cassandra CDC using Sidecar (#87)
This is the initial commit for CEP-44 to introduce a standalone CDC module into the Analytics project. This module provides the foundation for CDC in the Apache Cassandra Sidecar.
This module provides:
- a standalone Cdc class as the entrypoint for initializing CDC.
- pluggable interfaces for: listing and reading commit log segments for a token range, persisting and reading CDC state, providing the Cassandra table schema, optionally reading values from Cassandra.
- read and deserialize commit log mutations.
- reconcile and de-duplicate mutations across replicas.
- serialize CDC state into a binary object for persistence.
- a layer for converting Cassandra mutations into a standard consumable format.
Patch by James Berragan, Jyothsna Konisa, Yifan Cai; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-19962
Co-authored-by: James Berragan <jberragan@apple.com>
Co-authored-by: Yifan Cai <ycai@apache.org>
Co-authored-by: Jyothsna Konisa <jkonisa@apple.com>
3bcc5297bd3115ff5949a9295eed6a9ad03fd096 | Author: jberragan <jberragan@gmail.com>
| 2024-10-17 10:02:25-07:00
CASSANDRA-19980: Remove SparkSQL dependency from CassandraBridge so that it can be used independent from Spark (#88)
Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19980
972535d0f7cd828b7e0e40706adbe8897a436a5d | Author: jberragan <jberragan@gmail.com>
| 2024-09-17 15:52:45-07:00
CASSANDRA-19927: Remove old compression cache and move to using cache of CompressionMetadata (#84)
Deprecate old compression cache and move to using cache of CompressionMetadata, so that:
- we no longer cache an entire byte array on heap
- we cache and re-use the CompressionMetadata object so that only one BigLongArray object is allocated for the chunk offsets
Patch by James Berragan; Reviewed by Yifan Cai; Francisco Guerrero for CASSANDRA-19927
abce3c83fb5db8cc79edb68b247ab25582a0e9a8 | Author: jberragan <jberragan@gmail.com>
| 2024-09-11 12:55:18-07:00
CASSANDRA-19815: Decouple Cassandra types from Spark types so Cassandra types can be used independently from Spark (#71)
Patch by James Berragan; Reviewed by Yifan Cai; Francisco Guerrero for CASSANDRA-19815
bff083e35ebc338daf93c1a3553c590ae1864115 | Author: jberragan <jberragan@gmail.com>
| 2024-09-08 08:13:57-07:00
CASSANDRA-19900: Make the compression cache configurable to reduce heap pressure for large SSTables (#77)
Patch by James Berragan; Reviewed by Francisco Guerrero; Yifan Cai for CASSANDRA-19900
3023a204c8ef16f886bd3dc219f7534b7edbaf2a | Author: jberragan <jberragan@gmail.com>
| 2024-08-03 07:51:52+01:00
CASSANDRA-19807: Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead of utf-8 string (#70)
Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19807
84d84fe36b0d6e250c3d221c28c40b6925e4c222 | Author: jberragan <jberragan@gmail.com>
| 2024-07-22 13:38:28-07:00
CASSANDRA-19791: Remove other uses of Apache Commons lang for hashcode, equality and random string generation (#67)
Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19791
458a3630f882ae2b2a9cee272cf85ca7ff42f5cd | Author: jberragan <jberragan@gmail.com>
| 2024-07-17 14:29:21-07:00
CASSANDRA-19778: Split out BufferingInputStream stats into separate i… (#66)
Split BufferingInputStream stats into separate interface so class level generics are not required for the Stats interface
Patch by James Berragan; Reviewed by Bernardo Botella, Francisco Guerrero, Yifan Cai for CASSANDRA-19778
a242b352c28947427a9bfc30295a487017439fd9 | Author: jberragan <jberragan@gmail.com>
| 2024-07-12 14:57:38-07:00
CASSANDRA-19748: Refactoring to introduce new cassandra-analytics-common module with minimal dependencies (#62)
- Add new module cassandra-analytics-common with no dependencies on Spark or Cassandra and minimal standard dependencies (Guava, Jackson, Commons Lang Kryo etc)
- Move standalone classes to cassandra-analytics-common module.
Some additional refactoring and clean up:
- Rename SSTableInputStream -> BufferingInputStream
- Rename SSTableSource -> CassandraFileSource
- Introduce CassandraFile interface to be the implementing class for SSTable and CommitLog.
- Generalize IStats to work across different CassandraFile types
- Rename methods in StreamScanner to make the API clearer.
- Move ComplexTypeBuffer, ListBuffer, MapBuffer, SetBuffer, UdtBuffer to standalone classes
- Delete unused classes RangeTombstone, ReplciaSet and CollectionElement.
- Remove commons lang as a dependency
- Rename Rid to RowData
Patch by James Berragan; Reviewed by Bernardo Botella, Dinesh Joshi, Francisco Guerrero, Yifan Cai, Yuriy Semchyshyn for CASSANDRA-19748