17 Collaborator |
Brandon Williams , Štefan Miklošovič , Caleb Rackliffe , Andrés de la Peña , Berenguer Blasi , Ekaterina Dimitrova , Mick Semb Wever , David Capwell , Jacek Lewandowski , Branimir Lambov , Zhao Yang , Jonathan Ellis , Maxwell Guo , Piotr Kołaczkowski , Jeremiah Jordan , Jason Rutherglen , Jonathon Ellis |
18 Patch |
5 Review |
7aab61b06357ce0b59977715f82fed1ad24474b4,
7447ee5bddb31ea71a232a44d64dbb7dd0010708,
3b05051f8678c28bc9d93a89123c68f8d0b93b7b,
6a7bef12ecdf59e3a67c81b89c13e3c2bf7e19d8,
a9e6ed37874f2240039086309e7849bea42c07e2,
0e42b77c9735d1124fe0a5766447f29c891cdb5b,
c4d11c4372906ae1dea9e6c31c1136f122e8a1b2,
9697be1131bd8bb2332199000ad55dad12524fd2,
949b760f5516c139591473038917247b1fd7f500,
e45c1092f91edd63591f562b2120ea6a5fd3edd5,
9ce86e0ff8b6344b528a0640f9dafa23f97dd85a,
655a2455ac29395b0a303e6ad7fc4d458b18932d,
2531cb045897d5b771f79039d194a1f679d8629a,
eb208d3561eaf645f74f60b54c71ebe5bfc24c33,
cba3e19ccd81d705ca9f89c0eedab65824e9dd16,
6f125c80420f6d249b5414d886e1b4a93cc34e7f,
e5e0f3a8441503107b1ca2128cf8366e5e44d893,
cde91e56f09d9ebf315c79c9a81b89f70f4eb724 |
1d7bae3697b97e64de2c2b958427ef86a1b17731,
d16e8d3653dce8ed767a040c06dbaabc47a9b474,
83203a14c400ff99cfb2a5b7e655a663ea882c2b,
ebea2ba6ade00a6f156787ca4ee36b2f8eb003ad,
ae537abc6494564d7254a2126465522d86b44c1e |
7aab61b06357ce0b59977715f82fed1ad24474b4 | Author: Mike Adamson <mikea@apache.org>
| 2024-02-26 12:42:53+00:00
Use glove vectors instead of random vectors in vector tests
- avoid randomisation to make tests more consistent
- use heap_buffers for VectorDistributedTest for consistency with other tests
patch by Mike Adamson; reviewed by Ekaterina Dimitrova for CASSANDRA-19185
7447ee5bddb31ea71a232a44d64dbb7dd0010708 | Author: Mike Adamson <madamson@datastax.com>
| 2023-12-21 09:12:58+00:00
Avoid random IndexStreamingFailureTest failures
Change how ByteBuddy injections are handled to avoid ByteBuddy
failures after node restarts
Patch by Mike Adamson; reviewed by Caleb Rackliffe for CASSANDRA-19084
3b05051f8678c28bc9d93a89123c68f8d0b93b7b | Author: Mike Adamson <madamson@datastax.com>
| 2023-12-12 17:14:41+00:00
Simplify segment building in SAI to use single in-memory structure
This removes the RAMStringIndexer for literal indexes and replaces
it with a SegmentTrieBuffer that replaces BlockBalancedTreeRamBuffer
for literal and numeric indexes.
patch by Mike Adamson; reviewed by Andrés de la Peña, Caleb Rackliffe for CASSANDRA-18598
6a7bef12ecdf59e3a67c81b89c13e3c2bf7e19d8 | Author: Mike Adamson <madamson@datastax.com>
| 2023-11-28 10:48:23+00:00
Fix SAI intersection queries
- Fix comparison in PostingListRangeIterator for updating skip token
- Fix binary search in KeyLookup.clusteredSeekToKey
- Added new on-disk component for storing partition sizes by partition ID
patch by Mike Adamson; reviewed by Caleb Rackliffe, Mick Semb Wever for CASSANDRA-19011
a9e6ed37874f2240039086309e7849bea42c07e2 | Author: Mike Adamson <madamson@datastax.com>
| 2023-11-24 15:26:00+00:00
Fix broken indexing tests when using SAI
- This fixes a number of broken tests when the default index is set to SAI
- Composite partition indexes were being filtered prior to row filtering in the
index searcher resulting in incorrect results
- Static and non-static index intersection was failing because static primary keys
were not comparing correctly against non-static primary keys
patch by Mike Adamson; reviewed by Andres de la Peña, Michael Semb Wever for CASSANDRA-19034
0e42b77c9735d1124fe0a5766447f29c891cdb5b | Author: Mike Adamson <madamson@datastax.com>
| 2023-11-10 14:49:41+00:00
Improve code model around IndexContext
- Replace IndexContext with IndexTermType and IndexDefinition
- Move index specific managers, factories and metrics to StorageAttachedIndex
- Refactor Expression to explicitly define indexed and unindexed expressions
patch by Mike Adamson; reviewed by Andres de la Peña, Caleb Rackliffe for CASSANDRA-18166
c4d11c4372906ae1dea9e6c31c1136f122e8a1b2 | Author: Mike Adamson <madamson@datastax.com>
| 2023-10-30 09:46:52+00:00
Fix VectorUpdateDeleteTest for JDK 17
Removed use of reflection and directly set
relevant property to avoid jdk 17 errors
patch by Mike Adamson; reviewed by Stefan Miklosovic, Michael Semb Wever and Andrés de la Peña for CASSANDRA-18715
e45c1092f91edd63591f562b2120ea6a5fd3edd5 | Author: Mike Adamson <madamson@datastax.com>
| 2023-10-04 11:27:50+01:00
Correctly remove Index.Group from IndexRegistry
The Index.Group was being left in the list indexGroups in the SecondaryIndexManager because the incorrect
key was being used to remove it from the map
patch by Mike Adamson; reviewed by Caleb Rackliffe and Zhao Yang for CASSANDRA-18905
Co-authored-by: Zhao Yang <zhaoyangsingapore@gmail.com>
9697be1131bd8bb2332199000ad55dad12524fd2 | Author: Mike Adamson <madamson@datastax.com>
| 2023-09-28 16:54:31+01:00
Fix dtests returning ordering columns that have not been selected
patch by Mike Adamson; reviewed by adelapena, brandonwilliams and
Jeremiah Jordan for CASSANDRA-18892
d16e8d3653dce8ed767a040c06dbaabc47a9b474 | Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
| 2023-09-18 12:44:08+02:00
Do not create sstable files before registering in txn
Refactoring prevents the situation where some sstable components, like
data or index, are created before the new sstable is registered with
lifecycle transaction, which leads to a problem such that there is
a short time when incomplete sstable components are present. At the same
time, no transaction file is created, which leads to the possibility
that the sstable can be recognized as completed by various
transaction-aware listers.
Patch by Jacek Lewandowski; reviewed by Branimir Lambov, Mike Adamson for CASSANDRA-18737
949b760f5516c139591473038917247b1fd7f500 | Author: Mike Adamson <madamson@datastax.com>
| 2023-08-30 11:51:04+01:00
Add support for a vector search index in SAI
- Adds jbellis/jvector (1.0.2) library for DiskANN based indexes on floating point vectors
- Adds ORDER BY ANN OF capability to do ANN search and order the results by score
patch by Mike Adamson; reviewed by Andrés de la Peña, Jonathon Ellis for CASSANDRA-18715
Co-authored-by Jonathon Ellis jbellis@gmail.com
Co-authored-by Zhao Yang zhaoyangsingapore@gmail.com
9ce86e0ff8b6344b528a0640f9dafa23f97dd85a | Author: Mike Adamson <madamson@datastax.com>
| 2023-08-08 17:07:01+01:00
SAI result retriever is filtering too many rows
This patch fixes a bug in the SegmentMetadata that
was only storing the partition key for min and max
primary keys for a segment. It also contains some
refactoring of the PrimaryKey to remove the deferred
loading of PrimaryKeys by the PrimaryKeyMaps.
Patch by Mike Adamson; reviewed by Caleb Rackliffe and Andrés de la Peña for CASSANDRA-18734
655a2455ac29395b0a303e6ad7fc4d458b18932d | Author: Mike Adamson <madamson@datastax.com>
| 2023-07-28 17:38:20+01:00
Reduce size of per-SSTable index components for SAI
This patch removes the PRIMARY_KEY_TRIE component and adds KeyLookup.Cursor#clusteredSeekToKey() to
search for clustering keys within a partition. To do this a new on-disk component
PARTITION_SIZES has been added that holds the size of each partition in the SSTable.
patch by Mike Adamson; reviewed by Caleb Rackliffe and Andres de la Peña for CASSANDRA-18673
83203a14c400ff99cfb2a5b7e655a663ea882c2b | Author: Caleb Rackliffe <calebrackliffe@gmail.com>
| 2023-07-14 01:44:26-07:00
Importer should build SSTable indexes successfully before making new SSTables readable
- Avoid validation in response to SSTableAddedNotification, as it should already have been done somewhere else
- Change SSTableWriter to prevent commit when a failure is thrown out of an index build
patch by Caleb Rackliffe; reviewed by Mike Adamson and Andres de la Peña for CASSANDRA-18670
ebea2ba6ade00a6f156787ca4ee36b2f8eb003ad | Author: Jonathan Ellis <jbellis@datastax.com>
| 2023-06-26 14:50:01-05:00
Upgrade to lucene-core 9.7.0
Notes on the upgrade path:
- RamIndexOutput is replaced with ResettableByteBuffersIndexOutput, an extension of ByteBuffersIndexOutput, which was the closest thing to a replacement of RamIndexOutput.
- Lucene exposes the code we needed from DirectReaders more or less directly in DirectReader now, so the old copied code has been deleted.
- Lucene changed its data files to be little endian, but to keep its compatibility story simple it retained BE for the header and footer ints. That's the cause of the changes in SAICodecUtils.
- We could gain a bit of performance making our own code natively little endian but that is too big of a change for this patch.
patch by Jonathan Ellis; reviewed by Andrés de la Peña, Caleb Rackliffe, and Mike Adamson for CASSANDRA-18494
6f125c80420f6d249b5414d886e1b4a93cc34e7f | Author: Mike Adamson <madamson@datastax.com>
| 2023-06-12 11:25:17+01:00
Numeric on-disk index write and search
Includes:
- The disk/v1/kdtree package containing the
kdtree writer and reader
- The implementation code to tie these into
the existing read and write paths. The main parts
of this are the NumericIndexWriter and the
NumericIndexSegmentSearcher
- Additional testing for the new code
patch by Mike Adamson; reviewed by Caleb Rackliffe and Andres de la Peña for CASSANDRA-18067
Co-authored-by: Mike Adamson <madamson@datastax.com>
Co-authored-by: Caleb Rackliffe <calebrackliffe@gmail.com>
Co-authored-by: Piotr Kołaczkowski <pkolaczk@gmail.com>
Co-authored-by: Jason Rutherglen <jason.rutherglen@gmail.com>
Co-authored-by: Zhao Yang <zhaoyangsingapore@gmail.com>
cba3e19ccd81d705ca9f89c0eedab65824e9dd16 | Author: Mike Adamson <madamson@datastax.com>
| 2023-05-10 15:05:15+01:00
Query all ranges at once for SAI distributed queries
patch by Mike Adamson; reviewed by Caleb Rackliffe, Andres de la Peña, and Berenguer Blasi for CASSANDRA-18515
eb208d3561eaf645f74f60b54c71ebe5bfc24c33 | Author: Mike Adamson <madamson@datastax.com>
| 2023-05-09 12:29:01+01:00
Add basic text analysis to SAI, including "case_sensitive", "normalize", and "ascii" modes
patch by Mike Adamson; reviewed by Caleb Rackliffe and Andres de la Peña for CASSANDRA-18479
e5e0f3a8441503107b1ca2128cf8366e5e44d893 | Author: Mike Adamson <mikeatdot@gmail.com>
| 2023-04-13 17:23:13+01:00
Literal on-disk index and index write path (#9)
This commit contains the following additions
to SAI:
- The index write path and index building
based around StorageAttachedIndexBuilder
and StorageAttachedIndexWriter
- The on-disk index versioning using the
SSTable Descriptor analog IndexDescriptor
with Version and OnDiskFormat
- The literal on-disk index using the
LiteralIndexWriter
patch by Mike Adamson; reviewed by Caleb Rackliffe and Andres de la Peña for CASSANDRA-18062
Co-authored-by: Mike Adamson <mikeatdot@gmail.com>
Co-authored-by: Caleb Rackliffe <calebrackliffe@gmail.com>
Co-authored-by: Andres de la Peña <a.penya.garcia@gmail.com>
Co-authored-by: Piotr Kołaczkowski <pkolaczk@gmail.com>
Co-authored-by: Jason Rutherglen <jason.rutherglen@gmail.com>
cde91e56f09d9ebf315c79c9a81b89f70f4eb724 | Author: Mike Adamson <madamson@datastax.com>
| 2023-01-19 14:24:46+00:00
In-memory index implementation with query path
This includes the following elements of the Storage Attached Index:
- Memtable-attached indexes backed by an in-memory trie structure for byte-comparable values
- Query path for the in-memory index
- Index status propagation
- Randomized testing for Memtable-attached indexes
patch my Mike Adamson; reviewed by Caleb Rackliffe and Andres de la Peña for CASSANDRA-18058
Co-authored-by: Mike Adamson <madamson@datastax.com>
Co-authored-by: Caleb Rackliffe <calebrackliffe@gmail.com>