Conversation
| auto start_iter = _segment->lower_bound(index_key); | ||
| if (start_iter.valid()) { | ||
| // Because previous block may contain this key, so we should set rowid to | ||
| // last block's first row. |
There was a problem hiding this comment.
Is it only possible for duplicated key model?
There was a problem hiding this comment.
This is for all key models.
What we store is short key, full key will be truncated. If we see an equal short key, there may be some same full key in previous block.
There was a problem hiding this comment.
Now I think it's even possible when full key is stored. Say the 1st block contains ('aaa', 'baa'), the 2nd block contains ('bab', 'bac'). If we are searching for key >= 'baa', short key index returns the 2nd block, but the first matching key resides in the 1st block.
There was a problem hiding this comment.
Yes, you are right. The root cause is that this index is a sparse index.
| std::shared_ptr<TabletSchema> tablet_schema(new TabletSchema()); | ||
| tablet_schema->_num_columns = 4; | ||
| tablet_schema->_num_key_columns = 3; | ||
| tablet_schema->_num_short_key_columns = 2; |
There was a problem hiding this comment.
What's the difference between _num_key_columns and _num_short_key_columns, when could them be unequal?
There was a problem hiding this comment.
We use some of keys as index, which is short key. We don't use all key columns as key for memory concern. We want to load all index in to memory to accelerate reading.
In this patch, we create a new format for short key index. In orgin code index is stored in format like RowCusor which is not effecient to compare. Now we encode multiple column into binary, and we assure that this binary is sorted same with the key columns.
| auto start_iter = _segment->lower_bound(index_key); | ||
| if (start_iter.valid()) { | ||
| // Because previous block may contain this key, so we should set rowid to | ||
| // last block's first row. |
There was a problem hiding this comment.
Now I think it's even possible when full key is stored. Say the 1st block contains ('aaa', 'baa'), the 2nd block contains ('bab', 'bac'). If we are searching for key >= 'baa', short key index returns the 2nd block, but the first matching key resides in the 1st block.
| private: | ||
| friend class SegmentIterator; | ||
|
|
||
| Status new_column_iterator(uint32_t cid, ColumnIterator** iter); |
There was a problem hiding this comment.
| Status new_column_iterator(uint32_t cid, ColumnIterator** iter); | |
| Status _new_column_iterator(uint32_t cid, ColumnIterator** iter); |
There was a problem hiding this comment.
the same to the following two functions
There was a problem hiding this comment.
this is used by SegmentIterator. So I don't add _
| } | ||
|
|
||
| SegmentWriter::~SegmentWriter() { | ||
| for (auto writer : _column_writers) { |
There was a problem hiding this comment.
| for (auto writer : _column_writers) { | |
| for (auto& writer : _column_writers) { |
There was a problem hiding this comment.
For pointer, I think no need to use reference
| DCHECK(type_info != nullptr); | ||
|
|
||
| ColumnWriterOptions opts; | ||
| std::unique_ptr<ColumnWriter> writer(new ColumnWriter(opts, type_info, is_nullable, _output_file.get())); |
There was a problem hiding this comment.
can type_info and is_nullable and output file be put into ColumnWriterOptions?
There was a problem hiding this comment.
ColumnWirterOptions contains options. If there is no other set, this also can work. However I think type_info and is_nullable is what we really need, so I'd like put these out of options
be/src/olap/storage_engine.h
Outdated
| class Tablet; | ||
| class DataDir; | ||
| class EngineTask; | ||
| class SegmentGroup; |
There was a problem hiding this comment.
why add SegmentGroup here? I think segment group should not be used here.
There was a problem hiding this comment.
because _gc_files use this
There was a problem hiding this comment.
_gc_files is useless, delete it and SegmentGroup from storage engine
We create a new segment format for BetaRowset. New format merge
data file and index file into one file. And we create a new format
for short key index. In origin code index is stored in format like
RowCusor which is not efficient to compare. Now we encode multiple
column into binary, and we assure that this binary is sorted same
with the key columns.