9 Patch |
10 Review |
aea798dc7e517af520a403d4d86f3bc6bed65092,
690101840d4d8f9c656bb0ca114f6619af80e1cf,
c5d6dfd1bc9b682d704d28f77807ba72317b1944,
98baab1b8f0d5d7eb93f8d13db3b0a7a985fb03a,
c73c76498b0c2b36705025de6b0b2a7bb38e758b,
deebdf97ad01f23550d7d3b42d98c7bf111e2f95,
82b3c0a79c9322142738a4ec2ff7d4d4c0be2370,
bd0b41fb82134844a15fbb43126424d96706d08e,
1633cd9c6c3d88d5c66825fab76a369266509f7e |
dd536f2e70118cd5d0c319f5be3e54e3d50eb288,
a3ca0897c2190c2c18992ca2b7e5255318ff3eba,
6556d251bdddfbef3935da760bcda2b2387a4391,
4fb1e7f47d640353cd57f7a3035c70099049b29c,
f123406e458c0112145f37dcd3f8c20ba47c949d,
cfe293dadcf7a1d4491591cfd39fc410a8fa52ba,
555e8494d3ca27a7b35aebabb1f669eede20cc53,
d75a6bae5abbf80810012a181644f240141014d5,
164243e78f1557a34bc699ebc716b532781d6422,
0aaf5659028dd874c8d666c636f11eae63c429e6 |
dd536f2e70118cd5d0c319f5be3e54e3d50eb288 | Author: Yifan Cai <ycai@apache.org>
| 2024-11-15 15:31:14-08:00
CASSANDRA-20066: Expose detailed bulk write failure message for better insight (#92)
Patch by Yifan Cai; Reviewed by Doug Rohrer, Francisco Guerrero for CASSANDRA-20066
a3ca0897c2190c2c18992ca2b7e5255318ff3eba | Author: Yifan Cai <ycai@apache.org>
| 2024-11-05 14:22:45-08:00
CASSANDRA-19994: Add dataTransferApi and TwoPhaseImportCoordinator for coordinated write (#91)
Patch by Yifan Cai; Reviewed by Doug Rohrer, Francisco Guerrero for CASSANDRA-19994
f123406e458c0112145f37dcd3f8c20ba47c949d | Author: Yifan Cai <ycai@apache.org>
| 2024-09-11 21:03:47-07:00
CASSANDRA-19909: Add writer options COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters (#79)
The option specifies the configuration (in JSON) for coordinated write.
See org.apache.cassandra.spark.bulkwriter.coordinatedwrite.CoordinatedWriteConf.
When the option is present, SIDECAR_CONTACT_POINTS, SIDECAR_INSTANCES and LOCAL_DC are ignored if they are present.
Patch by Yifan Cai; Reviewed by Doug Rohrer, Francisco Guerrero for CASSANDRA-19909
cfe293dadcf7a1d4491591cfd39fc410a8fa52ba | Author: Yifan Cai <ycai@apache.org>
| 2024-08-30 11:08:00-07:00
CASSANDRA-19842: Consistency level check incorrectly passes when majority of the replica set is unavailable for write (#75)
Patch by Yifan Cai; Reviewed by Doug Rohrer, Francisco Guerrero for CASSANDRA-19842
555e8494d3ca27a7b35aebabb1f669eede20cc53 | Author: Yifan Cai <ycai@apache.org>
| 2024-08-20 17:33:35-07:00
CASSANDRA-19836: Fix NPE when writing UDT values (#74)
When UDT field values are set to null, the bulk writer throws NPE
Patch by Yifan Cai; Reviewed by Dinesh Joshi, Doug Rohrer for CASSANDRA-19836
d75a6bae5abbf80810012a181644f240141014d5 | Author: Yifan Cai <ycai@apache.org>
| 2024-08-14 13:10:15-07:00
CASSANDRA-19827: Add job_timeout_seconds writer option (#73)
Option to specify the timeout in seconds for bulk write jobs. By default, it is disabled.
When `JOB_TIMEOUT_SECONDS` is specified, a job exceeding the timeout is:
- successful when the desired consistency level is met
- a failure otherwise
Patch by Yifan Cai; Reviewed by Dinesh Joshi, Doug Rohrer for CASSANDRA-19827
aea798dc7e517af520a403d4d86f3bc6bed65092 | Author: Yifan Cai <52585731+yifan-c@users.noreply.github.com>
| 2024-04-22 15:46:08-07:00
CASSANDRA-19563: Support bulk write via S3 (#53)
This commit adds a configuration (writer) option to pick a transport other than the previously-implemented "direct upload to all sidecars" (now known as the "Direct" transport). The second transport, now being implemented, is the "S3_COMPAT" transport, which allows the job to upload the generated SSTables to an S3-compatible storage system, and then inform the Cassandra Sidecar that those files are available for download & commit.
Additionally, a plug-in system was added to allow communications between custom transport hooks and the job, so the custom hook can provide updated credentials and out-of-band status updates on S3-related issues.
Co-Authored-By: Yifan Cai <ycai@apache.org>
Co-Authored-By: Doug Rohrer <drohrer@apple.com>
Co-Authored-By: Francisco Guerrero <frankgh@apache.org>
Co-Authored-By: Saranya Krishnakumar <saranya_k@apple.com>
Patch by Yifan Cai, Doug Rohrer, Francisco Guerrero, Saranya Krishnakumar; Reviewed by Francisco Guerrero for CASSANDRA-19563
690101840d4d8f9c656bb0ca114f6619af80e1cf | Author: Francisco Guerrero <frankgh@apache.org>
| 2024-04-08 14:33:50-07:00
CASSANDRA-19526: Optionally enable TLS in the server and client for Analytics testing
All integration tests today run without TLS, which is generally fine because they run locally. However,
it is helpful to be able to start up the sidecar with TLS enabled in the integration test framework so
that third-party tests could connect via secure connections for testing purposes.
Co-authored-by: Doug Rohrer <drohrer@apple.com>
Co-authored-by: Francisco Guerrero <frankgh@apache.org>
Patch by Doug Rohrer, Francisco Guerrero; Reviewed by Yifan Cai for CASSANDRA-19526
164243e78f1557a34bc699ebc716b532781d6422 | Author: Arjun Ashok <arjun_ashok@apple.com>
| 2024-03-22 16:22:44-07:00
CASSANDRA-19418 - Changes to report additional bulk analytics job stats for instrumentation (#41)
Patch by Arjun Ashok; Reviewed by Doug Rohrer, Yifan Cai, Francisco Guerrero for CASSANDRA-19418
c73c76498b0c2b36705025de6b0b2a7bb38e758b | Author: Doug Rohrer <drohrer@apple.com>
| 2023-11-20 10:54:46-05:00
CASSANDRA-19048 - Audit table properties passed through Analytics CqlUtils
The following properties have an effect on the files generated by the
bulk writer, and therefore need to be retained when cleaning the table
schema:
bloom_filter_fp_chance
cdc
compression
default_time_to_live
min_index_interval
max_index_interval
Additionally, this commit adds tests to make sure all available TTL
paths, including table default TTLs and constant/per-row options, work
as designed.
Patch by Doug Rohrer; Reviewed by Francisco Guerrero Hernandez, Yifan Cai,
Dinesh Joshi for CASSANDRA-19048
0aaf5659028dd874c8d666c636f11eae63c429e6 | Author: Arjun Ashok <arjun_ashok@apple.com>
| 2023-10-09 07:53:40-07:00
CASSANDRA-18852 - Changes to make bulk writer resilient to cluster resize operations
Patch by Arjun Ashok, Saranya Krishnakumar; Reviewed by Yifan Cai, Francisco Guerrero, Doug Rohrer for CASSANDRA-18852
Co-authored-by: Arjun Ashok <arjun_ashok@apple.com>
Co-authored-by: Saranya Krishnakumar <saranya_k@apple.com>
82b3c0a79c9322142738a4ec2ff7d4d4c0be2370 | Author: Francisco Guerrero <frankgh@apache.org>
| 2023-07-25 12:41:10-07:00
CASSANDRA-18692 Fix bulk writes with Buffered RowBufferMode
When setting Buffered RowBufferMode as part of the `WriterOption`s,
`org.apache.cassandra.spark.bulkwriter.RecordWriter` ignores that configuration and instead
uses the batch size to determine when to finalize an SSTable and start writing a new SSTable,
if more rows are available.
In this commit, we fix `org.apache.cassandra.spark.bulkwriter.RecordWriter#checkBatchSize`
to take into account the configured `RowBufferMode`. And in specific to the case of the
`UNBUFFERED` RowBufferMode, we check then the batchSize of the SSTable during writes, and for
the case of `BUFFERED` that check will take no effect.
Co-authored-by: Doug Rohrer <doug@therohrers.org>
Patch by Francisco Guerrero, Doug Rohrer; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-18692
deebdf97ad01f23550d7d3b42d98c7bf111e2f95 | Author: Doug Rohrer <drohrer@apple.com>
| 2023-06-14 13:33:29-04:00
CASSANDRA-18759: Use in-jvm dtest framework from Sidecar for testing
This commit introduces the use of the in-jvm dtest framework for testing
Analytics workloads. It can spin up a Cassandra cluster, including the
necessary Sidecar process, to test writing to and reading from Cassandra
using the analytics library.
Additional changes made in this commit include
* Use concurrent collections in MockBulkWriterContext (Fixes flaky test StreamSessionConsistencyTest)
The StreamSessionConsistency test uses the MockBulkWriter context, but it wasn't originally used
(before this test was added) in a multi-threaded environment. Because of this, it would occasionally
throw ConcurrentModificationExceptions, which would cause the stream test to fail in a
non-deterministic way. This commit adds the use of concurrent/synchronous collections to the
MockBulkWriterContext to make sure it doesn't throw these spurious errors.
* Make the StartupValidation system thread-safe by using TreadLocals
instead of static collections, and clearing them once validation is
complete.
Patch by Doug Rohrer; Reviewed by Dinesh Joshi, Francisco Guerrero, Yifan Cai for CASSANDRA-18759
1633cd9c6c3d88d5c66825fab76a369266509f7e | Author: Dinesh Joshi <djoshi@apache.org>
| 2023-05-19 14:57:47-07:00
CEP-28: Apache Cassandra Analytics
This is the initial commit for the Apache Cassandra Analytics project
where we support reading and writing bulk data from Apache Cassandra from
Spark.
Patch by James Berragan, Doug Rohrer; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-16222
Co-authored-by: James Berragan <jberragan@apple.com>
Co-authored-by: Doug Rohrer <drohrer@apple.com>
Co-authored-by: Saranya Krishnakumar <saranya_k@apple.com>
Co-authored-by: Francisco Guerrero <francisco.guerrero@apple.com>
Co-authored-by: Yifan Cai <ycai@apache.org>
Co-authored-by: Jyothsna Konisa <jkonisa@apple.com>
Co-authored-by: Yuriy Semchyshyn <ysemchyshyn@apple.com>
Co-authored-by: Dinesh Joshi <djoshi@apache.org>