jberragan cassandra-analytics last 3 years


 9 Collaborator
Yifan Cai , Francisco Guerrero , Dinesh Joshi , James Berragan , Yuriy Semchyshyn , Jyothsna Konisa , Yifan Cai; Francisco Guerrero , Bernardo Botella , Francisco Guerrero; Yifan Cai

 10 Patch
e51716ee724cf4950df67eba0393b3f798b7dc00, 3bcc5297bd3115ff5949a9295eed6a9ad03fd096, 972535d0f7cd828b7e0e40706adbe8897a436a5d, abce3c83fb5db8cc79edb68b247ab25582a0e9a8, bff083e35ebc338daf93c1a3553c590ae1864115, 3023a204c8ef16f886bd3dc219f7534b7edbaf2a, bac08796181979afef4cc518789a380edef500f0, 84d84fe36b0d6e250c3d221c28c40b6925e4c222, 458a3630f882ae2b2a9cee272cf85ca7ff42f5cd, a242b352c28947427a9bfc30295a487017439fd9

e51716ee724cf4950df67eba0393b3f798b7dc00 | Author: jberragan <jberragan@gmail.com>
 | 2024-12-06 16:09:12-08:00

    CASSANDRA-19962: CEP-44 Kafka integration for Cassandra CDC using Sidecar (#87)
    
    This is the initial commit for CEP-44 to introduce a standalone CDC module into the Analytics project. This module provides the foundation for CDC in the Apache Cassandra Sidecar.
    
    This module provides:
    - a standalone Cdc class as the entrypoint for initializing CDC.
    - pluggable interfaces for: listing and reading commit log segments for a token range, persisting and reading CDC state, providing the Cassandra table schema, optionally reading values from Cassandra.
    - read and deserialize commit log mutations.
    - reconcile and de-duplicate mutations across replicas.
    - serialize CDC state into a binary object for persistence.
    - a layer for converting Cassandra mutations into a standard consumable format.
    
    Patch by James Berragan, Jyothsna Konisa, Yifan Cai; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-19962
    
    Co-authored-by: James Berragan <jberragan@apple.com>
    Co-authored-by: Yifan Cai <ycai@apache.org>
    Co-authored-by: Jyothsna Konisa <jkonisa@apple.com>

3bcc5297bd3115ff5949a9295eed6a9ad03fd096 | Author: jberragan <jberragan@gmail.com>
 | 2024-10-17 10:02:25-07:00

    CASSANDRA-19980: Remove SparkSQL dependency from CassandraBridge so that it can be used independent from Spark (#88)
    
    Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19980

972535d0f7cd828b7e0e40706adbe8897a436a5d | Author: jberragan <jberragan@gmail.com>
 | 2024-09-17 15:52:45-07:00

    CASSANDRA-19927: Remove old compression cache and move to using cache of CompressionMetadata (#84)
    
    Deprecate old compression cache and move to using cache of CompressionMetadata, so that:
    - we no longer cache an entire byte array on heap
    - we cache and re-use the CompressionMetadata object so that only one BigLongArray object is allocated for the chunk offsets
    
    Patch by James Berragan; Reviewed by Yifan Cai; Francisco Guerrero for CASSANDRA-19927

abce3c83fb5db8cc79edb68b247ab25582a0e9a8 | Author: jberragan <jberragan@gmail.com>
 | 2024-09-11 12:55:18-07:00

    CASSANDRA-19815: Decouple Cassandra types from Spark types so Cassandra types can be used independently from Spark (#71)
    
    
    Patch by James Berragan; Reviewed by Yifan Cai; Francisco Guerrero for CASSANDRA-19815

bff083e35ebc338daf93c1a3553c590ae1864115 | Author: jberragan <jberragan@gmail.com>
 | 2024-09-08 08:13:57-07:00

    CASSANDRA-19900: Make the compression cache configurable to reduce heap pressure for large SSTables (#77)
    
    
    Patch by James Berragan; Reviewed by Francisco Guerrero; Yifan Cai for CASSANDRA-19900

3023a204c8ef16f886bd3dc219f7534b7edbaf2a | Author: jberragan <jberragan@gmail.com>
 | 2024-08-03 07:51:52+01:00

    CASSANDRA-19807: Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead of utf-8 string (#70)
    
    Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19807

bac08796181979afef4cc518789a380edef500f0 | Author: jberragan <jberragan@gmail.com>
 | 2024-07-26 10:25:13-07:00

    CASSANDRA-19793 Split out CassandraTypes into separate module (#68)
    
    
    Patch by James Berragan; Reviewed by Yifan Cai, Francisco Guerrero for CASSANDRA-19793

84d84fe36b0d6e250c3d221c28c40b6925e4c222 | Author: jberragan <jberragan@gmail.com>
 | 2024-07-22 13:38:28-07:00

    CASSANDRA-19791: Remove other uses of Apache Commons lang for hashcode, equality and random string generation (#67)
    
    Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19791

458a3630f882ae2b2a9cee272cf85ca7ff42f5cd | Author: jberragan <jberragan@gmail.com>
 | 2024-07-17 14:29:21-07:00

    CASSANDRA-19778: Split out BufferingInputStream stats into separate i… (#66)
    
    Split BufferingInputStream stats into separate interface so class level generics are not required for the Stats interface
    
    Patch by James Berragan; Reviewed by Bernardo Botella, Francisco Guerrero, Yifan Cai for CASSANDRA-19778

a242b352c28947427a9bfc30295a487017439fd9 | Author: jberragan <jberragan@gmail.com>
 | 2024-07-12 14:57:38-07:00

    CASSANDRA-19748: Refactoring to introduce new cassandra-analytics-common module with minimal dependencies (#62)
    
    - Add new module cassandra-analytics-common with no dependencies on Spark or Cassandra and minimal standard dependencies (Guava, Jackson, Commons Lang Kryo etc)
    - Move standalone classes to cassandra-analytics-common module.
    
    Some additional refactoring and clean up:
    - Rename SSTableInputStream -> BufferingInputStream
    - Rename SSTableSource -> CassandraFileSource
    - Introduce CassandraFile interface to be the implementing class for SSTable and CommitLog.
    - Generalize IStats to work across different CassandraFile types
    - Rename methods in StreamScanner to make the API clearer.
    - Move ComplexTypeBuffer, ListBuffer, MapBuffer, SetBuffer, UdtBuffer to standalone classes
    - Delete unused classes RangeTombstone, ReplciaSet and CollectionElement.
    - Remove commons lang as a dependency
    - Rename Rid to RowData
    
    Patch by James Berragan; Reviewed by Bernardo Botella, Dinesh Joshi, Francisco Guerrero, Yifan Cai, Yuriy Semchyshyn for CASSANDRA-19748