Doug Rohrer cassandra-analytics all time


 6 Collaborator
Yifan Cai , Francisco Guerrero , Dinesh Joshi , Saranya Krishnakumar , Arjun Ashok , Francisco Guerrero Hernandez

 9 Patch  10 Review
aea798dc7e517af520a403d4d86f3bc6bed65092, 690101840d4d8f9c656bb0ca114f6619af80e1cf, c5d6dfd1bc9b682d704d28f77807ba72317b1944, 98baab1b8f0d5d7eb93f8d13db3b0a7a985fb03a, c73c76498b0c2b36705025de6b0b2a7bb38e758b, deebdf97ad01f23550d7d3b42d98c7bf111e2f95, 82b3c0a79c9322142738a4ec2ff7d4d4c0be2370, bd0b41fb82134844a15fbb43126424d96706d08e, 1633cd9c6c3d88d5c66825fab76a369266509f7e dd536f2e70118cd5d0c319f5be3e54e3d50eb288, a3ca0897c2190c2c18992ca2b7e5255318ff3eba, 6556d251bdddfbef3935da760bcda2b2387a4391, 4fb1e7f47d640353cd57f7a3035c70099049b29c, f123406e458c0112145f37dcd3f8c20ba47c949d, cfe293dadcf7a1d4491591cfd39fc410a8fa52ba, 555e8494d3ca27a7b35aebabb1f669eede20cc53, d75a6bae5abbf80810012a181644f240141014d5, 164243e78f1557a34bc699ebc716b532781d6422, 0aaf5659028dd874c8d666c636f11eae63c429e6

dd536f2e70118cd5d0c319f5be3e54e3d50eb288 | Author: Yifan Cai <ycai@apache.org>
 | 2024-11-15 15:31:14-08:00

    CASSANDRA-20066: Expose detailed bulk write failure message for better insight (#92)
    
    Patch by Yifan Cai; Reviewed by Doug Rohrer, Francisco Guerrero for CASSANDRA-20066

a3ca0897c2190c2c18992ca2b7e5255318ff3eba | Author: Yifan Cai <ycai@apache.org>
 | 2024-11-05 14:22:45-08:00

    CASSANDRA-19994: Add dataTransferApi and TwoPhaseImportCoordinator for coordinated write (#91)
    
    Patch by Yifan Cai; Reviewed by Doug Rohrer, Francisco Guerrero for CASSANDRA-19994

6556d251bdddfbef3935da760bcda2b2387a4391 | Author: Yifan Cai <ycai@apache.org>
 | 2024-10-04 20:58:44-07:00

    CASSANDRA-19981: Fix invalid prefix char produced by BundleNameGenerator (#89)
    
    Patch by Yifan Cai; Reviewed by         Doug Rohrer for CASSANDRA-19981

4fb1e7f47d640353cd57f7a3035c70099049b29c | Author: Yifan Cai <ycai@apache.org>
 | 2024-09-18 09:33:21-07:00

    CASSANDRA-19910: Support data partitioning for multiple clusters coordinated write (#80)
    
    Patch by Yifan Cai; Reviewed by Doug Rohrer for CASSANDRA-19910

f123406e458c0112145f37dcd3f8c20ba47c949d | Author: Yifan Cai <ycai@apache.org>
 | 2024-09-11 21:03:47-07:00

    CASSANDRA-19909: Add writer options COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters (#79)
    
    The option specifies the configuration (in JSON) for coordinated write.
    See org.apache.cassandra.spark.bulkwriter.coordinatedwrite.CoordinatedWriteConf.
    When the option is present, SIDECAR_CONTACT_POINTS, SIDECAR_INSTANCES and LOCAL_DC are ignored if they are present.
    
    Patch by Yifan Cai; Reviewed by Doug Rohrer, Francisco Guerrero for CASSANDRA-19909

cfe293dadcf7a1d4491591cfd39fc410a8fa52ba | Author: Yifan Cai <ycai@apache.org>
 | 2024-08-30 11:08:00-07:00

    CASSANDRA-19842: Consistency level check incorrectly passes when majority of the replica set is unavailable for write (#75)
    
    Patch by Yifan Cai; Reviewed by Doug Rohrer, Francisco Guerrero for CASSANDRA-19842

555e8494d3ca27a7b35aebabb1f669eede20cc53 | Author: Yifan Cai <ycai@apache.org>
 | 2024-08-20 17:33:35-07:00

    CASSANDRA-19836: Fix NPE when writing UDT values (#74)
    
    When UDT field values are set to null, the bulk writer throws NPE
    
    Patch by Yifan Cai; Reviewed by Dinesh Joshi, Doug Rohrer for CASSANDRA-19836

d75a6bae5abbf80810012a181644f240141014d5 | Author: Yifan Cai <ycai@apache.org>
 | 2024-08-14 13:10:15-07:00

    CASSANDRA-19827: Add job_timeout_seconds writer option (#73)
    
    Option to specify the timeout in seconds for bulk write jobs. By default, it is disabled.
    When `JOB_TIMEOUT_SECONDS` is specified, a job exceeding the timeout is:
    - successful when the desired consistency level is met
    - a failure otherwise
    
    Patch by Yifan Cai; Reviewed by Dinesh Joshi, Doug Rohrer for CASSANDRA-19827

aea798dc7e517af520a403d4d86f3bc6bed65092 | Author: Yifan Cai <52585731+yifan-c@users.noreply.github.com>
 | 2024-04-22 15:46:08-07:00

    CASSANDRA-19563: Support bulk write via S3 (#53)
    
    This commit adds a configuration (writer) option to pick a transport other than the previously-implemented "direct upload to all sidecars" (now known as the "Direct" transport).  The second transport, now being implemented, is the "S3_COMPAT" transport, which allows the job to upload the generated SSTables to an S3-compatible storage system, and then inform the Cassandra Sidecar that those files are available for download & commit.
    
    Additionally, a plug-in system was added to allow communications between custom transport hooks and the job, so the custom hook can provide updated credentials and out-of-band status updates on S3-related issues.
    
    Co-Authored-By: Yifan Cai <ycai@apache.org>
    Co-Authored-By: Doug Rohrer <drohrer@apple.com>
    Co-Authored-By: Francisco Guerrero <frankgh@apache.org>
    Co-Authored-By: Saranya Krishnakumar <saranya_k@apple.com>
    
    Patch by Yifan Cai, Doug Rohrer, Francisco Guerrero, Saranya Krishnakumar; Reviewed by Francisco Guerrero for CASSANDRA-19563

690101840d4d8f9c656bb0ca114f6619af80e1cf | Author: Francisco Guerrero <frankgh@apache.org>
 | 2024-04-08 14:33:50-07:00

    CASSANDRA-19526: Optionally enable TLS in the server and client for Analytics testing
    
    All integration tests today run without TLS, which is generally fine because they run locally. However,
    it is helpful to be able to start up the sidecar with TLS enabled in the integration test framework so
    that third-party tests could connect via secure connections for testing purposes.
    
    Co-authored-by: Doug Rohrer <drohrer@apple.com>
    Co-authored-by: Francisco Guerrero <frankgh@apache.org>
    
    Patch by Doug Rohrer, Francisco Guerrero; Reviewed by Yifan Cai for CASSANDRA-19526

164243e78f1557a34bc699ebc716b532781d6422 | Author: Arjun Ashok <arjun_ashok@apple.com>
 | 2024-03-22 16:22:44-07:00

    CASSANDRA-19418  - Changes to report additional bulk analytics job stats for instrumentation (#41)
    
    Patch by Arjun Ashok; Reviewed by Doug Rohrer, Yifan Cai, Francisco Guerrero for CASSANDRA-19418

c5d6dfd1bc9b682d704d28f77807ba72317b1944 | Author: Doug Rohrer <drohrer@apple.com>
 | 2024-02-27 22:03:04-05:00

    CASSANDRA-19340 - Support writing UDTs
    
    Patch by Doug Rohrer; Reviewed by Yifan Cai, Francisco Guerrero for CASSANDRA-19340

98baab1b8f0d5d7eb93f8d13db3b0a7a985fb03a | Author: Doug Rohrer <drohrer@apple.com>
 | 2024-02-27 22:03:04-05:00

    Make sure bridge exists

c73c76498b0c2b36705025de6b0b2a7bb38e758b | Author: Doug Rohrer <drohrer@apple.com>
 | 2023-11-20 10:54:46-05:00

    CASSANDRA-19048 - Audit table properties passed through Analytics CqlUtils
    
    The following properties have an effect on the files generated by the
    bulk writer, and therefore need to be retained when cleaning the table
    schema:
    
    bloom_filter_fp_chance
    cdc
    compression
    default_time_to_live
    min_index_interval
    max_index_interval
    
    Additionally, this commit adds tests to make sure all available TTL
    paths, including table default TTLs and constant/per-row options, work
    as designed.
    
    Patch by Doug Rohrer; Reviewed by Francisco Guerrero Hernandez, Yifan Cai,
    Dinesh Joshi for CASSANDRA-19048

0aaf5659028dd874c8d666c636f11eae63c429e6 | Author: Arjun Ashok <arjun_ashok@apple.com>
 | 2023-10-09 07:53:40-07:00

    CASSANDRA-18852 - Changes to make bulk writer resilient to cluster resize operations
    
    Patch by Arjun Ashok, Saranya Krishnakumar; Reviewed by Yifan Cai, Francisco Guerrero, Doug Rohrer for CASSANDRA-18852
    
    Co-authored-by: Arjun Ashok <arjun_ashok@apple.com>
    Co-authored-by: Saranya Krishnakumar <saranya_k@apple.com>

82b3c0a79c9322142738a4ec2ff7d4d4c0be2370 | Author: Francisco Guerrero <frankgh@apache.org>
 | 2023-07-25 12:41:10-07:00

    CASSANDRA-18692 Fix bulk writes with Buffered RowBufferMode
    
    When setting Buffered RowBufferMode as part of the `WriterOption`s,
    `org.apache.cassandra.spark.bulkwriter.RecordWriter` ignores that configuration and instead
    uses the batch size to determine when to finalize an SSTable and start writing a new SSTable,
    if more rows are available.
    
    In this commit, we fix `org.apache.cassandra.spark.bulkwriter.RecordWriter#checkBatchSize`
    to take into account the configured `RowBufferMode`. And in specific to the case of the
    `UNBUFFERED` RowBufferMode, we check then the batchSize of the SSTable during writes, and for
    the case of `BUFFERED` that check will take no effect.
    
    Co-authored-by: Doug Rohrer <doug@therohrers.org>
    
    Patch by Francisco Guerrero, Doug Rohrer; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-18692

deebdf97ad01f23550d7d3b42d98c7bf111e2f95 | Author: Doug Rohrer <drohrer@apple.com>
 | 2023-06-14 13:33:29-04:00

    CASSANDRA-18759: Use in-jvm dtest framework from Sidecar for testing
    
    This commit introduces the use of the in-jvm dtest framework for testing
    Analytics workloads. It can spin up a Cassandra cluster, including the
    necessary Sidecar process, to test writing to and reading from Cassandra
    using the analytics library.
    
    Additional changes made in this commit include
    
    * Use concurrent collections in MockBulkWriterContext (Fixes flaky test StreamSessionConsistencyTest)
    
        The StreamSessionConsistency test uses the MockBulkWriter context, but it wasn't originally used
        (before this test was added) in a multi-threaded environment. Because of this, it would occasionally
        throw ConcurrentModificationExceptions, which would cause the stream test to fail in a
        non-deterministic way. This commit adds the use of concurrent/synchronous collections to the
        MockBulkWriterContext to make sure it doesn't throw these spurious errors.
    
    * Make the StartupValidation system thread-safe by using TreadLocals
      instead of static collections, and clearing them once validation is
      complete.
    
    Patch by Doug Rohrer; Reviewed by Dinesh Joshi, Francisco Guerrero, Yifan Cai for CASSANDRA-18759

bd0b41fb82134844a15fbb43126424d96706d08e | Author: Doug Rohrer <drohrer@apple.com>
 | 2023-06-14 13:33:29-04:00

    CASSANDRA-18599 Upgrade to JUnit 5
    
    patch by Doug Rohrer, Francisco Guerrero; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-18599

1633cd9c6c3d88d5c66825fab76a369266509f7e | Author: Dinesh Joshi <djoshi@apache.org>
 | 2023-05-19 14:57:47-07:00

    CEP-28: Apache Cassandra Analytics
    
    This is the initial commit for the Apache Cassandra Analytics project
    where we support reading and writing bulk data from Apache Cassandra from
    Spark.
    
    Patch by James Berragan, Doug Rohrer; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-16222
    
    Co-authored-by: James Berragan <jberragan@apple.com>
    Co-authored-by: Doug Rohrer <drohrer@apple.com>
    Co-authored-by: Saranya Krishnakumar <saranya_k@apple.com>
    Co-authored-by: Francisco Guerrero <francisco.guerrero@apple.com>
    Co-authored-by: Yifan Cai <ycai@apache.org>
    Co-authored-by: Jyothsna Konisa <jkonisa@apple.com>
    Co-authored-by: Yuriy Semchyshyn <ysemchyshyn@apple.com>
    Co-authored-by: Dinesh Joshi <djoshi@apache.org>