98 lines
4.4 KiB
ReStructuredText
98 lines
4.4 KiB
ReStructuredText
|
Lossless Data Compression
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
Some lossless data compression algorithms are available in botan, currently all
|
||
|
via third party libraries - these include zlib (including deflate and gzip
|
||
|
formats), bzip2, and lzma. Support for these must be enabled at build time;
|
||
|
you can check for them using the macros ``BOTAN_HAS_ZLIB``, ``BOTAN_HAS_BZIP2``,
|
||
|
and ``BOTAN_HAS_LZMA``.
|
||
|
|
||
|
.. note::
|
||
|
You should always compress *before* you encrypt, because encryption seeks to
|
||
|
hide the redundancy that compression is supposed to try to find and remove.
|
||
|
|
||
|
Compression is done through the ``Compression_Algorithm`` and
|
||
|
``Decompression_Algorithm`` classes, both defined in `compression.h`
|
||
|
|
||
|
Compression and decompression both work in three stages: starting a
|
||
|
message (``start``), continuing to process it (``update``), and then
|
||
|
finally completing processing the stream (``finish``).
|
||
|
|
||
|
.. cpp:class:: Compression_Algorithm
|
||
|
|
||
|
.. cpp:function:: void start(size_t level)
|
||
|
|
||
|
Initialize the compression engine. This must be done before calling
|
||
|
``update`` or ``finish``. The meaning of the `level` parameter varies by
|
||
|
the algorithm but generally takes a value between 1 and 9, with higher
|
||
|
values implying typically better compression from and more memory and/or
|
||
|
CPU time consumed by the compression process. The decompressor can always
|
||
|
handle input from any compressor.
|
||
|
|
||
|
.. cpp:function:: void update(secure_vector<uint8_t>& buf, \
|
||
|
size_t offset = 0, bool flush = false)
|
||
|
|
||
|
Compress the material in the in/out parameter ``buf``. The leading
|
||
|
``offset`` bytes of ``buf`` are ignored and remain untouched; this can be
|
||
|
useful for ignoring packet headers. If ``flush`` is true, the
|
||
|
compression state is flushed, allowing the decompressor to recover the
|
||
|
entire message up to this point without having the see the rest of the
|
||
|
compressed stream.
|
||
|
|
||
|
.. cpp::function:: void finish(secure_vector<uint8_t>& buf, size_t offset = 0)
|
||
|
|
||
|
Finish compressing a message. The ``buf`` and ``offset`` parameters are
|
||
|
treated as in ``update``. It is acceptable to call ``start`` followed by
|
||
|
``finish`` with the entire message, without any intervening call to
|
||
|
``update``.
|
||
|
|
||
|
.. cpp:class:: Decompression_Algorithm
|
||
|
|
||
|
.. cpp:function:: void start()
|
||
|
|
||
|
Initialize the decompression engine. This must be done before calling
|
||
|
``update`` or ``finish``. No level is provided here; the decompressor
|
||
|
can accept input generated by any compression parameters.
|
||
|
|
||
|
.. cpp:function:: void update(secure_vector<uint8_t>& buf, \
|
||
|
size_t offset = 0)
|
||
|
|
||
|
Decompress the material in the in/out parameter ``buf``. The leading
|
||
|
``offset`` bytes of ``buf`` are ignored and remain untouched; this can be
|
||
|
useful for ignoring packet headers.
|
||
|
|
||
|
This function may throw if the data seems to be invalid.
|
||
|
|
||
|
.. cpp::function:: void finish(secure_vector<uint8_t>& buf, size_t offset = 0)
|
||
|
|
||
|
Finish decompressing a message. The ``buf`` and ``offset`` parameters are
|
||
|
treated as in ``update``. It is acceptable to call ``start`` followed by
|
||
|
``finish`` with the entire message, without any intervening call to
|
||
|
``update``.
|
||
|
|
||
|
This function may throw if the data seems to be invalid.
|
||
|
|
||
|
The easiest way to get a compressor is via the functions
|
||
|
``Compression_Algorithm::create`` and
|
||
|
``Decompression_Algorithm::create`` which both accept a string
|
||
|
argument which can take values include `zlib` (raw zlib with no
|
||
|
checksum), `deflate` (zlib's deflate format), `gzip`, `bz2`, and
|
||
|
`lzma`. A null pointer will be returned if the algorithm is
|
||
|
unavailable.
|
||
|
|
||
|
Two older functions for this are
|
||
|
|
||
|
.. cpp:function:: Compression_Algorithm* make_compressor(std::string type)
|
||
|
.. cpp:function:: Decompression_Algorithm* make_decompressor(std::string type)
|
||
|
|
||
|
which call the relevant ``create`` function and then ``release`` the
|
||
|
returned ``unique_ptr``. Avoid these in new code.
|
||
|
|
||
|
To use a compression algorithm in a `Pipe` use the adapter types
|
||
|
`Compression_Filter` and `Decompression_Filter` from `comp_filter.h`. The
|
||
|
constructors of both filters take a `std::string` argument (passed to
|
||
|
`make_compressor` or `make_decompressor`), the compression filter also takes a
|
||
|
`level` parameter. Finally both constructors have a parameter `buf_sz` which
|
||
|
specifies the size of the internal buffer that will be used - inputs will be
|
||
|
broken into blocks of this size. The default is 4096.
|