11 Collaborator |
Yifan Cai , Francisco Guerrero , Dinesh Joshi , Doug Rohrer , Saranya Krishnakumar , Jyothsna Konisa , Bernardo Botella , Yuriy Semchyshyn , jberragan , Yifan Cai; Francisco Guerrero , Francisco Guerrero; Yifan Cai |
13 Patch |
1 Review |
e51716ee724cf4950df67eba0393b3f798b7dc00,
3bcc5297bd3115ff5949a9295eed6a9ad03fd096,
972535d0f7cd828b7e0e40706adbe8897a436a5d,
6710efc212da2f135467fe3e972418ab5b9f5b78,
abce3c83fb5db8cc79edb68b247ab25582a0e9a8,
bff083e35ebc338daf93c1a3553c590ae1864115,
3023a204c8ef16f886bd3dc219f7534b7edbaf2a,
bac08796181979afef4cc518789a380edef500f0,
84d84fe36b0d6e250c3d221c28c40b6925e4c222,
458a3630f882ae2b2a9cee272cf85ca7ff42f5cd,
a242b352c28947427a9bfc30295a487017439fd9,
014db08a79f00ef0d94e6855779e398c9dc689c1,
1633cd9c6c3d88d5c66825fab76a369266509f7e |
a13532272051d4e4608f92d53bdd997103e8ea19 |
e51716ee724cf4950df67eba0393b3f798b7dc00 | Author: jberragan <jberragan@gmail.com>
| 2024-12-06 16:09:12-08:00
CASSANDRA-19962: CEP-44 Kafka integration for Cassandra CDC using Sidecar (#87)
This is the initial commit for CEP-44 to introduce a standalone CDC module into the Analytics project. This module provides the foundation for CDC in the Apache Cassandra Sidecar.
This module provides:
- a standalone Cdc class as the entrypoint for initializing CDC.
- pluggable interfaces for: listing and reading commit log segments for a token range, persisting and reading CDC state, providing the Cassandra table schema, optionally reading values from Cassandra.
- read and deserialize commit log mutations.
- reconcile and de-duplicate mutations across replicas.
- serialize CDC state into a binary object for persistence.
- a layer for converting Cassandra mutations into a standard consumable format.
Patch by James Berragan, Jyothsna Konisa, Yifan Cai; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-19962
Co-authored-by: James Berragan <jberragan@apple.com>
Co-authored-by: Yifan Cai <ycai@apache.org>
Co-authored-by: Jyothsna Konisa <jkonisa@apple.com>
3bcc5297bd3115ff5949a9295eed6a9ad03fd096 | Author: jberragan <jberragan@gmail.com>
| 2024-10-17 10:02:25-07:00
CASSANDRA-19980: Remove SparkSQL dependency from CassandraBridge so that it can be used independent from Spark (#88)
Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19980
972535d0f7cd828b7e0e40706adbe8897a436a5d | Author: jberragan <jberragan@gmail.com>
| 2024-09-17 15:52:45-07:00
CASSANDRA-19927: Remove old compression cache and move to using cache of CompressionMetadata (#84)
Deprecate old compression cache and move to using cache of CompressionMetadata, so that:
- we no longer cache an entire byte array on heap
- we cache and re-use the CompressionMetadata object so that only one BigLongArray object is allocated for the chunk offsets
Patch by James Berragan; Reviewed by Yifan Cai; Francisco Guerrero for CASSANDRA-19927
abce3c83fb5db8cc79edb68b247ab25582a0e9a8 | Author: jberragan <jberragan@gmail.com>
| 2024-09-11 12:55:18-07:00
CASSANDRA-19815: Decouple Cassandra types from Spark types so Cassandra types can be used independently from Spark (#71)
Patch by James Berragan; Reviewed by Yifan Cai; Francisco Guerrero for CASSANDRA-19815
bff083e35ebc338daf93c1a3553c590ae1864115 | Author: jberragan <jberragan@gmail.com>
| 2024-09-08 08:13:57-07:00
CASSANDRA-19900: Make the compression cache configurable to reduce heap pressure for large SSTables (#77)
Patch by James Berragan; Reviewed by Francisco Guerrero; Yifan Cai for CASSANDRA-19900
3023a204c8ef16f886bd3dc219f7534b7edbaf2a | Author: jberragan <jberragan@gmail.com>
| 2024-08-03 07:51:52+01:00
CASSANDRA-19807: Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead of utf-8 string (#70)
Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19807
84d84fe36b0d6e250c3d221c28c40b6925e4c222 | Author: jberragan <jberragan@gmail.com>
| 2024-07-22 13:38:28-07:00
CASSANDRA-19791: Remove other uses of Apache Commons lang for hashcode, equality and random string generation (#67)
Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19791
458a3630f882ae2b2a9cee272cf85ca7ff42f5cd | Author: jberragan <jberragan@gmail.com>
| 2024-07-17 14:29:21-07:00
CASSANDRA-19778: Split out BufferingInputStream stats into separate i… (#66)
Split BufferingInputStream stats into separate interface so class level generics are not required for the Stats interface
Patch by James Berragan; Reviewed by Bernardo Botella, Francisco Guerrero, Yifan Cai for CASSANDRA-19778
a242b352c28947427a9bfc30295a487017439fd9 | Author: jberragan <jberragan@gmail.com>
| 2024-07-12 14:57:38-07:00
CASSANDRA-19748: Refactoring to introduce new cassandra-analytics-common module with minimal dependencies (#62)
- Add new module cassandra-analytics-common with no dependencies on Spark or Cassandra and minimal standard dependencies (Guava, Jackson, Commons Lang Kryo etc)
- Move standalone classes to cassandra-analytics-common module.
Some additional refactoring and clean up:
- Rename SSTableInputStream -> BufferingInputStream
- Rename SSTableSource -> CassandraFileSource
- Introduce CassandraFile interface to be the implementing class for SSTable and CommitLog.
- Generalize IStats to work across different CassandraFile types
- Rename methods in StreamScanner to make the API clearer.
- Move ComplexTypeBuffer, ListBuffer, MapBuffer, SetBuffer, UdtBuffer to standalone classes
- Delete unused classes RangeTombstone, ReplciaSet and CollectionElement.
- Remove commons lang as a dependency
- Rename Rid to RowData
Patch by James Berragan; Reviewed by Bernardo Botella, Dinesh Joshi, Francisco Guerrero, Yifan Cai, Yuriy Semchyshyn for CASSANDRA-19748
a13532272051d4e4608f92d53bdd997103e8ea19 | Author: Yifan Cai <52585731+yifan-c@users.noreply.github.com>
| 2024-03-05 11:06:36-08:00
CASSANDRA-19452 Use constant reference time during bulk read process (#44)
patch by Yifan Cai; reviewed by Francisco Guerrero, James Berragan for CASSANDRA-19452
014db08a79f00ef0d94e6855779e398c9dc689c1 | Author: James Berragan <jberragan@apple.com>
| 2023-07-19 12:23:07-07:00
CASSANDRA-18683: Add PartitionSizeTableProvider for reading the compressed and uncompressed sizes of all partitions in a table by utilizing the SSTable Index.db files
Patch by James Berragan; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-18683
1633cd9c6c3d88d5c66825fab76a369266509f7e | Author: Dinesh Joshi <djoshi@apache.org>
| 2023-05-19 14:57:47-07:00
CEP-28: Apache Cassandra Analytics
This is the initial commit for the Apache Cassandra Analytics project
where we support reading and writing bulk data from Apache Cassandra from
Spark.
Patch by James Berragan, Doug Rohrer; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-16222
Co-authored-by: James Berragan <jberragan@apple.com>
Co-authored-by: Doug Rohrer <drohrer@apple.com>
Co-authored-by: Saranya Krishnakumar <saranya_k@apple.com>
Co-authored-by: Francisco Guerrero <francisco.guerrero@apple.com>
Co-authored-by: Yifan Cai <ycai@apache.org>
Co-authored-by: Jyothsna Konisa <jkonisa@apple.com>
Co-authored-by: Yuriy Semchyshyn <ysemchyshyn@apple.com>
Co-authored-by: Dinesh Joshi <djoshi@apache.org>