@jacek-lewandowski last 3 years


 3 Collaborator
Jacek Lewandowski , Branimir Lambov , @blambov

 1 Patch
b7e1e44a909c3a1d11e9c387db680c74d31b879f

b7e1e44a909c3a1d11e9c387db680c74d31b879f | Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
 | 2023-03-02 12:46:25+01:00

    SSTable format API
    
    Summary of the changes:
    
    Format, reader and writer
    ---------------------------
    There are a lot of refactorings around sstable related classes aiming to extract the most generic functionality to the top-level entities and push down implementation-specific stuff to the actual implementation. In Particular, the top-level, implementation agnostic classes/interfaces are SSTableFormat interface, SSTable, SSTableReader, SSTableWriter, IVerifier, and IScrubber. The rest of the codebase has been reviewed for explicit usages of big table format-specific usages of sstable classes and refactored. SSTable, SSTableReader, and SSTableWriter have their builders. Builders make a hierarchy that follows the same inheritance structure as readers and writers.
    
    There are also partial implementations that add support for some features and may or may not be used by the custom implementations. They include:
    - AbstractSSTableFormat - adds an implementation of some initialization methods - in practice, all of the format implementations should extend this class
    - SSTableReaderWithFilter - add support for Bloom filter to the reader
    - SortedTableWriter - generic implementation for a writer which writes partitions in the default order to the data file, supports Bloom filter and some index of partitions
    - IndexSummarySupport - interface implemented by the readers using index summaries
    - KeyCacheSupport - interface implemented by the readers using row key cache
    
    Descriptor
    ---------------------------
    Refactored the Descriptor class so that:
    - All paths are created from the base directory File rather than from a String
    - All the methods named *filename* producing full paths were made private; their current implementations are returning file names rather than paths (the naming was inconsistent)
    - The usages of the `filenameFor` method were refactored to use the `fileFor` method
    - The usages of the `fromFilename` method were refactored to use a  `fromFileWithComponent(..., false).left` expression
    In essence, the Descriptor class is no longer working on String-based paths.
    
    Index summaries
    ---------------------------
    Removed the index summary from the generic SSTableReader class and created an interface IndexSummarySupport to be implemented by the readers that need it. Methods in related classes that refer back to the reader were refactored to support just readers of the SSTableReader & IndexSummarySupport type. Therefore, we will no longer need to assume that the generic SSTableReader has anything to do with an index summary.
    
    A new IndexSummaryComponent class encloses data fields from the index summary file (note that aside from the index summary itself, the file includes the first and last partition of the sstable). The class has been extracted to deal with those fields and have that logic in a single place.
    
    Filter
    ---------------------------
    Refactored IFilter and its serialization - in particular, added the `serialize` method to the IFilter interface and moved loading/saving logic to a separate utility class FilterComponent.
    Extracted the SSTableReaderWithFilter abstract reader extending the generic SSTableReader with filter support.
    Extracted bloom filter metrics into separate entities allowing to plug them in if the implementation uses a filter.
    
    Cache
    ---------------------------
    Refactored CacheService to support different key-cache values. CacheService now supports arbitrary IRowIndexEntry implementation as a key-cache value. A new version of the auto-saving cache was created ("g") because some information about the type of serialized row index entry needs to be known before it is deserialized (or skipped). Therefore, the SSTableFormat type ordinal number is stored, which is sufficient because the IRowIndexEntry serializer is specific to the sstable format type.
    Similarly to the IndexSummarySupport, a new KeyCacheSupport interface has to be implemented to mark the reader as supporting key-cache. It contains the default implementation of several methods the rest of the system relies on when the key-cache is supported.
    
    Other changes
    ---------------------------
    - Fixed disabling chunk cache - enable(boolean) method in ChunkCache does not make any sense - it makes a false impression it can disable chunk cache once enabled, while in fact, it only clears it. Added setFileCacheEnabled to DatabaseDescriptor
    
    - Made WrappingUnfilteredRowIterator an interface
    
    - DataInputStreamPlus extends InputStream - this makes it possible for input stream-based inheritors of DataInputPlus to extend DataInputStreamPlus. It simplifies coding because sometimes we want to get DataInputPlus implementation extending InputStream as an argument.
    
    - Table and keyspace metrics were made pluggable - in particular, added the ability for a certain format to register gauges that are specific only to that format and make no sense for others
    
    - Implemented mmapped region extension for compressed data
    
    - Refactored FileHandle so that it is no longer closable
    
    - Implemented WrappingRebufferer
    
    - Introduced the SSTable.Owner interface to make SSTable implementation not reference higher-level entities directly. SSTable accepts passing null as the owner when there is no owner (like sometimes in offline tools) or passing a mock when needed in tests.
    
    Individual commits
    ---------------------------
    
    [4a87cd36fe] Fix disabling chunk cache
    [c84c75ccf3] Made WrappingUnfilteredRowIterator an interface
    [253d2b828e] Add getType to SSTableFormat
    [3f169dcc20] Remove getIndexSerializer from SSTableFormat
    [05bae1833b] Pull down rowIndexEntrySerializer field
    [da675f2809] Moved RowIndexEntry
    [673f0c5c39] Reduce usages of RowIndexEntry
    [c72538be91] Refactor CacheService to support for different key cache values
    [54d33ee656] Minor refactoring of ColumnIndex
    [93862df967] Just moved AbstractSSTableIterator to o.a.c.io.sstable.format
    [9e4566a1de] Refactored AbstractSSTableIterator
    [a4e61e80bb] Extracted IScrubber and IVerifier interfaces
    [20f78c7419] Push down implementation of SSTableReader.firstKeyBeyond
    [f2c24e5774] Moved SSTableReader.getSampleIndexesForRanges to IndexSummary
    [b6c3a6c1ea] Moved SSTableReader.getKeySamples implementation to IndexSummary
    [c4b90ebb33] Refactor InstanceTidier so that it is more generic
    [918d5a9e74] Refactor dropping page cache
    [a52fb4d558] Refactor sstable metrics
    [f6d10f930f] NEW (fix up) - DataInputStreamPlus extends InputStream
    [8f6a56d972] Getting rid of index summary in SSTableReader
    [4a918bf725] Removed direct usages of primary index from SSTableReader
    [358fa32602] Refactor KeyIterator so that it is sstable format agnostic
    [14c09d89c2] Remove explicit usage of Components outside of format specific classes
    [feff14e137] Move clone methods implementation from SSTableReader to BigTableReader
    [64e9787b10] Move saveIndexSummary and saveBloomFilter to SSTableReaderBuilder
    [ae71fe6ed8] Moved indexSummary field to BigTableReader and made it private
    [df9fd8c4b9] Moved ifile field to BigTableReader and made it private
    [2be6ea9ecf] Moved static open methods for BigTableReader to the reader factory
    [bc0e55ac48] Minor refactoring around IFilter and its serialization
    [5b95704beb] Minor refactorings around IndexSummary
    [87812335e8] Extracted TOCComponent class to deal with TOC file
    [fdad092a6a] Extracted CompressionInfoComponent class
    [39b47e388d] Extracted StatsComponent as a helper for elements of SSTable metadata
    [cdb55bff47] Fix SSTable.getMinimalKey
    [b99c6d5805] Refactor FileHandle so that it is no longer closable
    [77b7f7ace5] Implement WrappingRebufferer
    [b6868914dd] Add progressPercentage to ProgressInfo
    [7fd4956e5b] Moved copy/rename/hardLink methods from SSTableWriter to SSTable
    [1ccc6bf148] Create generic SSTableBuilder and IOOptions
    [da58a81102] Refactor SSTableReaderBuilder
    [4501ddba1c] Refactor ColumnIndex
    [d4f9e1a64b] Extracted non-big-table-specific functionality from BigTableWriter to SortedTableWriter
    [379525d01e] Refactor BigTableZeroCopyWriter to SSTableZeroCopyWriter as it is not specific to big format
    [8ac37f83bc] Extract EmptySSTableScanner out from BigTableScanner
    [ee6673f1cf] Implement SSTableWriterBuilder
    [bb26629235] Refactor opening early / final
    [a327595015] Refactored SSTableWriter factory
    [16ffd7334b] Extract non-big-format-specific logic from scrubber and verifier
    [75e02db6af] Allow to specify the default SSTableFormat via system property
    [a7b9d0d628] Small fixes around streaming
    [407f977c36] Move guard collection size
    [0529e57d2f] Remove explicit references to big format
    [61509963ec] Unclassified minor changes
    [da28d1af3a] Replaced getCreationTimeFor(Component) with getDataCreationTime()
    [e99c834de6] !!! Reformatting
    [882b7baa5a] Rename SSTableReader.maybePresent and fix its redundant usages
    [b70c983bea] Implement mmapped region extension for compressed data
    [d7ff3970de] Introduce SSTable.Owner interface
    [e9feb9c462] Replaced getCreationTimeFor(Component) with getDataCreationTime()
    [ee8082fb07] Created SSTableFormat.deleteOrphanedComponents
    [e62950fd3d] Refactor metrics further
    [cefa5b3814] Extract key cache support into separate entity
    [dd55101ca1] Extracted SSTableReaderWithFilter
    [510b651824] Implement customizable component types
    [2be512d9fa] Pluggable SSTableFormat by making SSTableFormat.Type not an enum
    [670836b55d] Refactor CRC and digest validators
    [00c91103bc] Extract delete method to delete SSTables and purge row cache entries
    [0819dc9fc2] Extracted trySkipFileCacheBefore(key) to SSTableReader
    [732f841750] Added missing overrides in ForwardingSSTableReader
    [db623218fd] Update DatabaseDescriptorRefTest
    [c018c468e5] Cleanup
    [eafc836242] Add @SuppressWarnings("resource") where needed
    [3b7c911dd6] Documentation
    
    patch by Jacek Lewandowski, reviewed by Branimir Lambov for CASSANDRA-17056
    
    Co-authored-by: @jacek-lewandowski
    Co-authored-by: @blambov