BitMagic-C++
|
Bit-vector serialization class. More...
#include <bmserial.h>
Data Structures | |
struct | bookmark_state |
Bookmark state structure. More... | |
Public Types | |
typedef BV | bvector_type |
typedef bvector_type::allocator_type | allocator_type |
typedef bvector_type::blocks_manager_type | blocks_manager_type |
typedef bvector_type::statistics | statistics_type |
typedef bvector_type::block_idx_type | block_idx_type |
typedef bvector_type::size_type | size_type |
typedef byte_buffer< allocator_type > | buffer |
typedef bm::bv_ref_vector< BV > | bv_ref_vector_type |
Public Member Functions | |
serializer (const allocator_type &alloc=allocator_type(), bm::word_t *temp_block=0) | |
Constructor. | |
serializer (bm::word_t *temp_block) | |
~serializer () | |
Compression level settings | |
| |
void | set_compression_level (unsigned clevel) BMNOEXCEPT |
Set compression level. | |
unsigned | get_compression_level () const BMNOEXCEPT |
Get compression level (0-5), Default 5 (recommended) 0 - take as is 1, 2 - apply light weight RLE/GAP encodings, limited depth hierarchical compression, intervals encoding 3 - variant of 2 with different cut-offs 4 - delta transforms plus Elias Gamma encoding where possible legacy) 5 - binary interpolated encoding (Moffat, et al) | |
Serialization Methods | |
| |
size_type | serialize (const BV &bv, unsigned char *buf, size_t buf_size) |
Bitvector serialization into memory block. | |
void | serialize (const BV &bv, typename serializer< BV >::buffer &buf, const statistics_type *bv_stat=0) |
Bitvector serialization into buffer object (resized automatically) | |
void | optimize_serialize_destroy (BV &bv, typename serializer< BV >::buffer &buf) |
Bitvector serialization into buffer object (resized automatically) Input bit-vector gets optimized and then destroyed, content is NOT guaranteed after this operation. | |
const size_type * | get_compression_stat () const BMNOEXCEPT |
Return serialization counter vector. | |
void | gap_length_serialization (bool value) BMNOEXCEPT |
Set GAP length serialization (serializes GAP levels of the original vector) | |
void | byte_order_serialization (bool value) BMNOEXCEPT |
Set byte-order serialization (for cross platform compatibility) | |
void | set_bookmarks (bool enable, unsigned bm_interval=256) BMNOEXCEPT |
Add skip-markers to serialization BLOB for faster range decode at the expense of some BLOB size increase. | |
void | set_ref_vectors (const bv_ref_vector_type *ref_vect) |
Attach collection of reference vectors for XOR serialization (no transfer of ownership for the pointer) | |
void | set_curr_ref_idx (size_type ref_idx) BMNOEXCEPT |
Set current index in rer.vector collection (not a row idx or plain idx) | |
void | encode_header (const BV &bv, bm::encoder &enc) BMNOEXCEPT |
Encode serialization header information. | |
void | encode_gap_block (const bm::gap_word_t *gap_block, bm::encoder &enc) |
void | gamma_gap_block (const bm::gap_word_t *gap_block, bm::encoder &enc) BMNOEXCEPT |
void | gamma_gap_array (const bm::gap_word_t *gap_block, unsigned arr_len, bm::encoder &enc, bool inverted=false) BMNOEXCEPT |
Encode GAP block as delta-array with Elias Gamma coder. | |
void | encode_bit_array (const bm::word_t *block, bm::encoder &enc, bool inverted) BMNOEXCEPT |
Encode bit-block as an array of bits. | |
void | gamma_gap_bit_block (const bm::word_t *block, bm::encoder &enc) BMNOEXCEPT |
void | gamma_arr_bit_block (const bm::word_t *block, bm::encoder &enc, bool inverted) BMNOEXCEPT |
void | bienc_arr_bit_block (const bm::word_t *block, bm::encoder &enc, bool inverted) BMNOEXCEPT |
void | bienc_gap_bit_block (const bm::word_t *block, bm::encoder &enc) BMNOEXCEPT |
encode bit-block as interpolated bit block of gaps | |
void | interpolated_arr_bit_block (const bm::word_t *block, bm::encoder &enc, bool inverted) BMNOEXCEPT |
void | interpolated_gap_bit_block (const bm::word_t *block, bm::encoder &enc) BMNOEXCEPT |
encode bit-block as interpolated gap block | |
void | interpolated_gap_array (const bm::gap_word_t *gap_block, unsigned arr_len, bm::encoder &enc, bool inverted) BMNOEXCEPT |
Encode GAP block as an array with binary interpolated coder. | |
void | interpolated_gap_array_v0 (const bm::gap_word_t *gap_block, unsigned arr_len, bm::encoder &enc, bool inverted) BMNOEXCEPT |
void | interpolated_encode_gap_block (const bm::gap_word_t *gap_block, bm::encoder &enc) BMNOEXCEPT |
void | encode_bit_interval (const bm::word_t *blk, bm::encoder &enc, unsigned size_control) BMNOEXCEPT |
Encode BIT block with repeatable runs of zeroes. | |
void | encode_bit_digest (const bm::word_t *blk, bm::encoder &enc, bm::id64_t d0) BMNOEXCEPT |
Encode bit-block using digest (hierarchical compression) | |
unsigned char | find_gap_best_encoding (const bm::gap_word_t *gap_block) BMNOEXCEPT |
Determine best representation for GAP block based on current set compression level. | |
unsigned char | find_bit_best_encoding (const bm::word_t *block) BMNOEXCEPT |
Determine best representation for a bit-block. | |
unsigned char | find_bit_best_encoding_l5 (const bm::word_t *block) BMNOEXCEPT |
Determine best representation for a bit-block (level 5) | |
void | reset_compression_stats () BMNOEXCEPT |
Reset all accumulated compression statistics. | |
void | reset_models () BMNOEXCEPT |
void | add_model (unsigned char mod, unsigned score) BMNOEXCEPT |
static void | process_bookmark (block_idx_type nb, bookmark_state &bookm, bm::encoder &enc) BMNOEXCEPT |
Check if bookmark needs to be placed and if so, encode it into serialization BLOB. | |
Bit-vector serialization class.
Class designed to convert sparse bit-vectors into a single block of memory ready for file or database storage or network transfer.
Reuse of this class for multiple serializations (but not across threads). Class resue offers some performance advantage (helps with temp memory reallocations).
Definition at line 75 of file bmserial.h.
bvector_type::allocator_type bm::serializer< BV >::allocator_type |
Definition at line 79 of file bmserial.h.
bvector_type::block_idx_type bm::serializer< BV >::block_idx_type |
Definition at line 82 of file bmserial.h.
bvector_type::blocks_manager_type bm::serializer< BV >::blocks_manager_type |
Definition at line 80 of file bmserial.h.
byte_buffer<allocator_type> bm::serializer< BV >::buffer |
Definition at line 85 of file bmserial.h.
bm::bv_ref_vector<BV> bm::serializer< BV >::bv_ref_vector_type |
Definition at line 86 of file bmserial.h.
BV bm::serializer< BV >::bvector_type |
Definition at line 78 of file bmserial.h.
bvector_type::size_type bm::serializer< BV >::size_type |
Definition at line 83 of file bmserial.h.
bvector_type::statistics bm::serializer< BV >::statistics_type |
Definition at line 81 of file bmserial.h.
bm::serializer< BV >::serializer | ( | const allocator_type & | alloc = allocator_type(), |
bm::word_t * | temp_block = 0 ) |
Constructor.
alloc | - memory allocator |
temp_block | - temporary block for various operations (if NULL it will be allocated and managed by serializer class) Temp block is used as a scratch memory during serialization, use of external temp block allows to avoid unnecessary re-allocations. |
Temp block attached is not owned by the class and NOT deallocated on destruction.
Definition at line 1042 of file bmserial.h.
References bm::gap_max_bits.
bm::serializer< BV >::serializer | ( | bm::word_t * | temp_block | ) |
Definition at line 1071 of file bmserial.h.
References bm::gap_max_bits.
bm::serializer< BV >::~serializer | ( | ) |
Definition at line 1099 of file bmserial.h.
|
protected |
Definition at line 1487 of file bmserial.h.
References BM_ASSERT.
|
protected |
Definition at line 1996 of file bmserial.h.
References bm::bit_convert_to_arr(), bm::gap_equiv_len, and bm::gap_max_bits.
|
protected |
encode bit-block as interpolated bit block of gaps
Definition at line 2025 of file bmserial.h.
References bm::bit_out< TEncoder >::bic_encode_u16(), bm::bie_cut_off, bm::bit_to_gap(), BM_ASSERT, bm::bit_out< TEncoder >::flush(), bm::gap_max_bits, bm::set_block_bitgap_bienc, and bm::set_block_size.
void bm::serializer< BV >::byte_order_serialization | ( | bool | value | ) |
Set byte-order serialization (for cross platform compatibility)
value | - TRUE serialization format includes byte-order marker |
Definition at line 1132 of file bmserial.h.
Referenced by convert_bv2bvs(), main(), and bm::serialize().
|
protected |
Encode bit-block as an array of bits.
Definition at line 1943 of file bmserial.h.
References bm::bit_convert_to_arr(), bm::gap_max_bits, bm::gap_max_bits_cmrz, bm::set_block_arrbit, and bm::set_block_arrbit_inv.
|
protected |
Encode bit-block using digest (hierarchical compression)
Definition at line 1846 of file bmserial.h.
References bm::bmi_blsi_u64(), bm::bmi_bslr_u64(), bm::set_block_bit, bm::set_block_bit_digest0, bm::set_block_digest_wave_size, bm::set_block_size, and bm::word_bitcount64().
|
protected |
Encode BIT block with repeatable runs of zeroes.
Definition at line 1794 of file bmserial.h.
References BM_ASSERT, bm::set_block_bit_0runs, and bm::set_block_size.
|
protected |
Encode GAP block
Definition at line 1734 of file bmserial.h.
References BM_ASSERT, BM_FALLTHROUGH, bm::gap_convert_to_arr(), bm::gap_equiv_len, bm::encoder::put_16(), bm::encoder::put_8(), bm::set_block_arrgap, bm::set_block_arrgap_bienc, bm::set_block_arrgap_bienc_inv, bm::set_block_arrgap_egamma, bm::set_block_arrgap_egamma_inv, bm::set_block_arrgap_inv, bm::set_block_bit_1bit, bm::set_block_gap, and bm::set_block_gap_bienc.
|
protected |
Encode serialization header information.
Definition at line 1168 of file bmserial.h.
References bm::BM_HM_64_BIT, bm::BM_HM_DEFAULT, bm::BM_HM_HXOR, bm::BM_HM_NO_BO, bm::BM_HM_NO_GAPL, bm::BM_HM_RESIZE, bm::globals< T >::byte_order(), bm::gap_levels, and bm::id_max.
|
protected |
Determine best representation for a bit-block.
Definition at line 1584 of file bmserial.h.
References bm::bit_block_change_bc(), bm::bit_block_count(), bm::bit_count_nonzero_size(), BM_ASSERT, bm::calc_block_digest0(), bm::gap_equiv_len, bm::gap_max_bits, bm::gap_max_buff_len, bm::set_block_aone, bm::set_block_arrbit, bm::set_block_arrbit_inv, bm::set_block_arrgap_egamma, bm::set_block_arrgap_egamma_inv, bm::set_block_azero, bm::set_block_bit, bm::set_block_bit_0runs, bm::set_block_bit_1bit, bm::set_block_bit_digest0, bm::set_block_gap_egamma, bm::set_block_size, and bm::word_bitcount64().
|
protected |
Determine best representation for a bit-block (level 5)
Definition at line 1496 of file bmserial.h.
References bm::bie_cut_off, bm::bit_block_calc_change(), bm::bit_block_change_bc(), bm::bit_block_count(), bm::bit_count_nonzero_size(), BM_ASSERT, bm::calc_block_digest0(), bm::gap_equiv_len, bm::gap_max_bits, bm::gap_max_buff_len, bm::set_block_aone, bm::set_block_arr_bienc, bm::set_block_arr_bienc_inv, bm::set_block_arrbit, bm::set_block_arrbit_inv, bm::set_block_arrgap_bienc, bm::set_block_arrgap_bienc_inv, bm::set_block_azero, bm::set_block_bit, bm::set_block_bit_0runs, bm::set_block_bit_1bit, bm::set_block_bit_digest0, bm::set_block_bitgap_bienc, bm::set_block_gap_bienc, bm::set_block_size, and bm::word_bitcount64().
|
protected |
Determine best representation for GAP block based on current set compression level.
Definition at line 1690 of file bmserial.h.
References bm::gap_bit_count_unr(), bm::gap_length(), bm::gap_max_bits, bm::set_block_arrgap, bm::set_block_arrgap_bienc, bm::set_block_arrgap_bienc_inv, bm::set_block_arrgap_egamma, bm::set_block_arrgap_egamma_inv, bm::set_block_arrgap_inv, bm::set_block_bit_1bit, bm::set_block_gap, bm::set_block_gap_bienc, and bm::set_block_gap_egamma.
|
protected |
Definition at line 1976 of file bmserial.h.
References bm::bit_convert_to_arr(), bm::gap_equiv_len, bm::gap_max_bits, bm::set_block_bit, and bm::set_block_size.
|
protected |
Encode GAP block as delta-array with Elias Gamma coder.
Definition at line 1319 of file bmserial.h.
References bm::bit_out< TEncoder >::gamma(), bm::set_block_arrgap, bm::set_block_arrgap_egamma, bm::set_block_arrgap_egamma_inv, and bm::set_block_arrgap_inv.
|
protected |
Definition at line 1967 of file bmserial.h.
References bm::bit_to_gap(), BM_ASSERT, and bm::gap_equiv_len.
|
protected |
Encode GAP block with Elias Gamma coder
Definition at line 1280 of file bmserial.h.
References bm::for_each_dgap(), bm::gap_length(), bm::set_block_gap, and bm::set_block_gap_egamma.
void bm::serializer< BV >::gap_length_serialization | ( | bool | value | ) |
Set GAP length serialization (serializes GAP levels of the original vector)
value | - when TRUE serialized vector includes GAP levels parameters |
Definition at line 1126 of file bmserial.h.
Referenced by convert_bv2bvs(), main(), bm::compressed_collection_serializer< CBC >::serialize(), and bm::serialize().
|
inline |
Get compression level (0-5), Default 5 (recommended) 0 - take as is 1, 2 - apply light weight RLE/GAP encodings, limited depth hierarchical compression, intervals encoding 3 - variant of 2 with different cut-offs 4 - delta transforms plus Elias Gamma encoding where possible legacy) 5 - binary interpolated encoding (Moffat, et al)
Recommended: use 3 or 5
Definition at line 130 of file bmserial.h.
|
inline |
Return serialization counter vector.
Definition at line 193 of file bmserial.h.
|
protected |
Definition at line 2073 of file bmserial.h.
References bm::bit_out< TEncoder >::bic_encode_u16(), bm::bit_convert_to_arr(), BM_ASSERT, bm::bit_out< TEncoder >::flush(), bm::gap_max_bits, bm::gap_max_bits_cmrz, bm::set_block_arr_bienc, bm::set_block_arr_bienc_inv, and bm::set_block_size.
|
protected |
Encode GAP block with using binary interpolated encoder
Definition at line 1220 of file bmserial.h.
References bm::bit_out< TEncoder >::bic_encode_u16(), BM_ASSERT, bm::bit_out< TEncoder >::flush(), bm::gap_length(), bm::set_block_gap, bm::set_block_gap_bienc, and bm::set_block_gap_bienc_v2.
|
protected |
Encode GAP block as an array with binary interpolated coder.
Definition at line 1412 of file bmserial.h.
References bm::bit_out< TEncoder >::bic_encode_u16(), BM_ASSERT, bm::bit_out< TEncoder >::flush(), bm::set_block_arrgap, bm::set_block_arrgap_bienc_inv_v2, bm::set_block_arrgap_bienc_v2, and bm::set_block_arrgap_inv.
|
protected |
Definition at line 1364 of file bmserial.h.
References bm::bit_out< TEncoder >::bic_encode_u16(), BM_ASSERT, bm::bit_out< TEncoder >::flush(), bm::bit_out< TEncoder >::gamma(), bm::set_block_arrgap, bm::set_block_arrgap_bienc, bm::set_block_arrgap_bienc_inv, and bm::set_block_arrgap_inv.
|
protected |
encode bit-block as interpolated gap block
Definition at line 2015 of file bmserial.h.
References bm::bit_to_gap(), BM_ASSERT, and bm::gap_max_bits.
void bm::serializer< BV >::optimize_serialize_destroy | ( | BV & | bv, |
typename serializer< BV >::buffer & | buf ) |
Bitvector serialization into buffer object (resized automatically) Input bit-vector gets optimized and then destroyed, content is NOT guaranteed after this operation.
Effectively it moves data into the buffer.
The reason this operation exsists is because it is faster to do all three operations in one single pass. This is a destructive serialization!
bv | - input/output bitvector |
buf | - output buffer object |
Definition at line 1925 of file bmserial.h.
References bm::bvector< Alloc >::mem_pool_guard::assign_if_not_set(), and bm::serialize().
Referenced by main().
|
staticprotected |
Check if bookmark needs to be placed and if so, encode it into serialization BLOB.
nb | - block idx |
bookm | - bookmark state structure |
enc | - BLOB encoder |
Definition at line 2154 of file bmserial.h.
References BM_ASSERT, bm::set_nb_bookmark16, bm::set_nb_bookmark24, bm::set_nb_bookmark32, bm::set_nb_sync_mark16, bm::set_nb_sync_mark24, bm::set_nb_sync_mark32, bm::set_nb_sync_mark48, bm::set_nb_sync_mark64, and bm::set_nb_sync_mark8.
|
protected |
Reset all accumulated compression statistics.
Definition at line 1111 of file bmserial.h.
|
inlineprotected |
Definition at line 333 of file bmserial.h.
void bm::serializer< BV >::serialize | ( | const BV & | bv, |
typename serializer< BV >::buffer & | buf, | ||
const statistics_type * | bv_stat = 0 ) |
Bitvector serialization into buffer object (resized automatically)
bv | - input bitvector |
buf | - output buffer object |
bv_stat | - input (optional) bit-vector statistics object if NULL, serialize will compute the statistics |
Definition at line 1903 of file bmserial.h.
References BM_ASSERT, bm::bv_statistics::max_serialize_mem, and bm::serialize().
serializer< BV >::size_type bm::serializer< BV >::serialize | ( | const BV & | bv, |
unsigned char * | buf, | ||
size_t | buf_size ) |
Bitvector serialization into memory block.
bv | - input bitvector |
buf | - out buffer (pre-allocated) No range checking is done in this method. It is responsibility of caller to allocate sufficient amount of memory using information from calc_stat() function. |
buf_size | - size of the output buffer |
Definition at line 2264 of file bmserial.h.
References bm::bit_block_find(), BM_ASSERT, BM_IS_GAP, BM_SER_NEXT_GRP, BMGAP_PTR, bm::check_block_one(), bm::check_block_zero(), FULL_BLOCK_FAKE_ADDR, bm::gap_length(), bm::gap_operation_xor(), bm::get_block_coord(), bm::encoder::put_16(), bm::encoder::put_32(), bm::encoder::put_64(), bm::encoder::put_8(), bm::encoder::put_prefixed_array_32(), bm::set_block_16one, bm::set_block_16zero, bm::set_block_1one, bm::set_block_1zero, bm::set_block_32one, bm::set_block_32zero, bm::set_block_64one, bm::set_block_64zero, bm::set_block_8one, bm::set_block_8zero, bm::set_block_aone, bm::set_block_arr_bienc, bm::set_block_arr_bienc_inv, bm::set_block_arrbit, bm::set_block_arrbit_inv, bm::set_block_arrgap_bienc, bm::set_block_arrgap_bienc_inv, bm::set_block_arrgap_egamma, bm::set_block_arrgap_egamma_inv, bm::set_block_azero, bm::set_block_bit, bm::set_block_bit_0runs, bm::set_block_bit_1bit, bm::set_block_bit_digest0, bm::set_block_bitgap_bienc, bm::set_block_end, bm::set_block_gap_bienc, bm::set_block_gap_egamma, bm::set_block_ref_eq, bm::set_block_size, bm::set_block_xor_gap_ref16, bm::set_block_xor_gap_ref32, bm::set_block_xor_gap_ref8, bm::set_block_xor_ref16, bm::set_block_xor_ref32, bm::set_block_xor_ref8, bm::set_total_blocks, and bm::encoder::size().
Referenced by convert_bv2bvs(), main(), make_BLOB(), bm::compressed_collection_serializer< CBC >::serialize(), and bm::serialize().
void bm::serializer< BV >::set_bookmarks | ( | bool | enable, |
unsigned | bm_interval = 256 ) |
Add skip-markers to serialization BLOB for faster range decode at the expense of some BLOB size increase.
enable | - TRUE searilization will add bookmark codes |
bm_interval | - bookmark interval in (number of blocks) (suggested between 4 and 512) smaller interval means more bookmarks added to the skip list thus more increasing the BLOB size |
Definition at line 1138 of file bmserial.h.
Referenced by main(), and bm::sparse_vector_serializer< SV >::set_bookmarks().
void bm::serializer< BV >::set_compression_level | ( | unsigned | clevel | ) |
Set compression level.
Higher compression takes more time to process.
clevel | - compression level (0-5) |
Definition at line 1119 of file bmserial.h.
References bm::set_compression_max.
Referenced by convert_bv2bvs(), main(), and make_BLOB().
void bm::serializer< BV >::set_curr_ref_idx | ( | size_type | ref_idx | ) |
Set current index in rer.vector collection (not a row idx or plain idx)
Definition at line 1162 of file bmserial.h.
void bm::serializer< BV >::set_ref_vectors | ( | const bv_ref_vector_type * | ref_vect | ) |
Attach collection of reference vectors for XOR serialization (no transfer of ownership for the pointer)
Definition at line 1153 of file bmserial.h.