James Berragan cassandra-analytics last 3 years


 9 Collaborator
Yifan Cai , Francisco Guerrero , Dinesh Joshi , jberragan , Yuriy Semchyshyn , Jyothsna Konisa , Yifan Cai; Francisco Guerrero , Bernardo Botella , Francisco Guerrero; Yifan Cai

 13 Patch  1 Review
e51716ee724cf4950df67eba0393b3f798b7dc00, 3bcc5297bd3115ff5949a9295eed6a9ad03fd096, 972535d0f7cd828b7e0e40706adbe8897a436a5d, 6710efc212da2f135467fe3e972418ab5b9f5b78, abce3c83fb5db8cc79edb68b247ab25582a0e9a8, bff083e35ebc338daf93c1a3553c590ae1864115, 3023a204c8ef16f886bd3dc219f7534b7edbaf2a, bac08796181979afef4cc518789a380edef500f0, 84d84fe36b0d6e250c3d221c28c40b6925e4c222, 458a3630f882ae2b2a9cee272cf85ca7ff42f5cd, a242b352c28947427a9bfc30295a487017439fd9, 014db08a79f00ef0d94e6855779e398c9dc689c1, 1633cd9c6c3d88d5c66825fab76a369266509f7e a13532272051d4e4608f92d53bdd997103e8ea19

e51716ee724cf4950df67eba0393b3f798b7dc00 | Author: jberragan <jberragan@gmail.com>
 | 2024-12-06 16:09:12-08:00

    CASSANDRA-19962: CEP-44 Kafka integration for Cassandra CDC using Sidecar (#87)
    
    This is the initial commit for CEP-44 to introduce a standalone CDC module into the Analytics project. This module provides the foundation for CDC in the Apache Cassandra Sidecar.
    
    This module provides:
    - a standalone Cdc class as the entrypoint for initializing CDC.
    - pluggable interfaces for: listing and reading commit log segments for a token range, persisting and reading CDC state, providing the Cassandra table schema, optionally reading values from Cassandra.
    - read and deserialize commit log mutations.
    - reconcile and de-duplicate mutations across replicas.
    - serialize CDC state into a binary object for persistence.
    - a layer for converting Cassandra mutations into a standard consumable format.
    
    Patch by James Berragan, Jyothsna Konisa, Yifan Cai; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-19962
    
    Co-authored-by: James Berragan <jberragan@apple.com>
    Co-authored-by: Yifan Cai <ycai@apache.org>
    Co-authored-by: Jyothsna Konisa <jkonisa@apple.com>

3bcc5297bd3115ff5949a9295eed6a9ad03fd096 | Author: jberragan <jberragan@gmail.com>
 | 2024-10-17 10:02:25-07:00

    CASSANDRA-19980: Remove SparkSQL dependency from CassandraBridge so that it can be used independent from Spark (#88)
    
    Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19980

972535d0f7cd828b7e0e40706adbe8897a436a5d | Author: jberragan <jberragan@gmail.com>
 | 2024-09-17 15:52:45-07:00

    CASSANDRA-19927: Remove old compression cache and move to using cache of CompressionMetadata (#84)
    
    Deprecate old compression cache and move to using cache of CompressionMetadata, so that:
    - we no longer cache an entire byte array on heap
    - we cache and re-use the CompressionMetadata object so that only one BigLongArray object is allocated for the chunk offsets
    
    Patch by James Berragan; Reviewed by Yifan Cai; Francisco Guerrero for CASSANDRA-19927

6710efc212da2f135467fe3e972418ab5b9f5b78 | Author: James Berragan <jberragan@gmail.com>
 | 2024-09-11 13:36:40-07:00

    Ninja fix for CASSANDRA-19815
    
    Fixes Scala 2.13 build and configuration for CI

abce3c83fb5db8cc79edb68b247ab25582a0e9a8 | Author: jberragan <jberragan@gmail.com>
 | 2024-09-11 12:55:18-07:00

    CASSANDRA-19815: Decouple Cassandra types from Spark types so Cassandra types can be used independently from Spark (#71)
    
    
    Patch by James Berragan; Reviewed by Yifan Cai; Francisco Guerrero for CASSANDRA-19815

bff083e35ebc338daf93c1a3553c590ae1864115 | Author: jberragan <jberragan@gmail.com>
 | 2024-09-08 08:13:57-07:00

    CASSANDRA-19900: Make the compression cache configurable to reduce heap pressure for large SSTables (#77)
    
    
    Patch by James Berragan; Reviewed by Francisco Guerrero; Yifan Cai for CASSANDRA-19900

3023a204c8ef16f886bd3dc219f7534b7edbaf2a | Author: jberragan <jberragan@gmail.com>
 | 2024-08-03 07:51:52+01:00

    CASSANDRA-19807: Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead of utf-8 string (#70)
    
    Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19807

bac08796181979afef4cc518789a380edef500f0 | Author: jberragan <jberragan@gmail.com>
 | 2024-07-26 10:25:13-07:00

    CASSANDRA-19793 Split out CassandraTypes into separate module (#68)
    
    
    Patch by James Berragan; Reviewed by Yifan Cai, Francisco Guerrero for CASSANDRA-19793

84d84fe36b0d6e250c3d221c28c40b6925e4c222 | Author: jberragan <jberragan@gmail.com>
 | 2024-07-22 13:38:28-07:00

    CASSANDRA-19791: Remove other uses of Apache Commons lang for hashcode, equality and random string generation (#67)
    
    Patch by James Berragan; Reviewed by Francisco Guerrero, Yifan Cai for CASSANDRA-19791

458a3630f882ae2b2a9cee272cf85ca7ff42f5cd | Author: jberragan <jberragan@gmail.com>
 | 2024-07-17 14:29:21-07:00

    CASSANDRA-19778: Split out BufferingInputStream stats into separate i… (#66)
    
    Split BufferingInputStream stats into separate interface so class level generics are not required for the Stats interface
    
    Patch by James Berragan; Reviewed by Bernardo Botella, Francisco Guerrero, Yifan Cai for CASSANDRA-19778

a242b352c28947427a9bfc30295a487017439fd9 | Author: jberragan <jberragan@gmail.com>
 | 2024-07-12 14:57:38-07:00

    CASSANDRA-19748: Refactoring to introduce new cassandra-analytics-common module with minimal dependencies (#62)
    
    - Add new module cassandra-analytics-common with no dependencies on Spark or Cassandra and minimal standard dependencies (Guava, Jackson, Commons Lang Kryo etc)
    - Move standalone classes to cassandra-analytics-common module.
    
    Some additional refactoring and clean up:
    - Rename SSTableInputStream -> BufferingInputStream
    - Rename SSTableSource -> CassandraFileSource
    - Introduce CassandraFile interface to be the implementing class for SSTable and CommitLog.
    - Generalize IStats to work across different CassandraFile types
    - Rename methods in StreamScanner to make the API clearer.
    - Move ComplexTypeBuffer, ListBuffer, MapBuffer, SetBuffer, UdtBuffer to standalone classes
    - Delete unused classes RangeTombstone, ReplciaSet and CollectionElement.
    - Remove commons lang as a dependency
    - Rename Rid to RowData
    
    Patch by James Berragan; Reviewed by Bernardo Botella, Dinesh Joshi, Francisco Guerrero, Yifan Cai, Yuriy Semchyshyn for CASSANDRA-19748

a13532272051d4e4608f92d53bdd997103e8ea19 | Author: Yifan Cai <52585731+yifan-c@users.noreply.github.com>
 | 2024-03-05 11:06:36-08:00

    CASSANDRA-19452 Use constant reference time during bulk read process (#44)
    
    patch by Yifan Cai; reviewed by Francisco Guerrero, James Berragan for CASSANDRA-19452

014db08a79f00ef0d94e6855779e398c9dc689c1 | Author: James Berragan <jberragan@apple.com>
 | 2023-07-19 12:23:07-07:00

    CASSANDRA-18683: Add PartitionSizeTableProvider for reading the compressed and uncompressed sizes of all partitions in a table by utilizing the SSTable Index.db files
    
    Patch by James Berragan; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-18683

1633cd9c6c3d88d5c66825fab76a369266509f7e | Author: Dinesh Joshi <djoshi@apache.org>
 | 2023-05-19 14:57:47-07:00

    CEP-28: Apache Cassandra Analytics
    
    This is the initial commit for the Apache Cassandra Analytics project
    where we support reading and writing bulk data from Apache Cassandra from
    Spark.
    
    Patch by James Berragan, Doug Rohrer; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-16222
    
    Co-authored-by: James Berragan <jberragan@apple.com>
    Co-authored-by: Doug Rohrer <drohrer@apple.com>
    Co-authored-by: Saranya Krishnakumar <saranya_k@apple.com>
    Co-authored-by: Francisco Guerrero <francisco.guerrero@apple.com>
    Co-authored-by: Yifan Cai <ycai@apache.org>
    Co-authored-by: Jyothsna Konisa <jkonisa@apple.com>
    Co-authored-by: Yuriy Semchyshyn <ysemchyshyn@apple.com>
    Co-authored-by: Dinesh Joshi <djoshi@apache.org>