Lossless Data Compression ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Some lossless data compression algorithms are available in botan, currently all via third party libraries - these include zlib (including deflate and gzip formats), bzip2, and lzma. Support for these must be enabled at build time; you can check for them using the macros ``BOTAN_HAS_ZLIB``, ``BOTAN_HAS_BZIP2``, and ``BOTAN_HAS_LZMA``. .. note:: You should always compress *before* you encrypt, because encryption seeks to hide the redundancy that compression is supposed to try to find and remove. Compression is done through the ``Compression_Algorithm`` and ``Decompression_Algorithm`` classes, both defined in `compression.h` Compression and decompression both work in three stages: starting a message (``start``), continuing to process it (``update``), and then finally completing processing the stream (``finish``). .. cpp:class:: Compression_Algorithm .. cpp:function:: void start(size_t level) Initialize the compression engine. This must be done before calling ``update`` or ``finish``. The meaning of the `level` parameter varies by the algorithm but generally takes a value between 1 and 9, with higher values implying typically better compression from and more memory and/or CPU time consumed by the compression process. The decompressor can always handle input from any compressor. .. cpp:function:: void update(secure_vector& buf, \ size_t offset = 0, bool flush = false) Compress the material in the in/out parameter ``buf``. The leading ``offset`` bytes of ``buf`` are ignored and remain untouched; this can be useful for ignoring packet headers. If ``flush`` is true, the compression state is flushed, allowing the decompressor to recover the entire message up to this point without having the see the rest of the compressed stream. .. cpp::function:: void finish(secure_vector& buf, size_t offset = 0) Finish compressing a message. The ``buf`` and ``offset`` parameters are treated as in ``update``. It is acceptable to call ``start`` followed by ``finish`` with the entire message, without any intervening call to ``update``. .. cpp:class:: Decompression_Algorithm .. cpp:function:: void start() Initialize the decompression engine. This must be done before calling ``update`` or ``finish``. No level is provided here; the decompressor can accept input generated by any compression parameters. .. cpp:function:: void update(secure_vector& buf, \ size_t offset = 0) Decompress the material in the in/out parameter ``buf``. The leading ``offset`` bytes of ``buf`` are ignored and remain untouched; this can be useful for ignoring packet headers. This function may throw if the data seems to be invalid. .. cpp::function:: void finish(secure_vector& buf, size_t offset = 0) Finish decompressing a message. The ``buf`` and ``offset`` parameters are treated as in ``update``. It is acceptable to call ``start`` followed by ``finish`` with the entire message, without any intervening call to ``update``. This function may throw if the data seems to be invalid. The easiest way to get a compressor is via the functions ``Compression_Algorithm::create`` and ``Decompression_Algorithm::create`` which both accept a string argument which can take values include `zlib` (raw zlib with no checksum), `deflate` (zlib's deflate format), `gzip`, `bz2`, and `lzma`. A null pointer will be returned if the algorithm is unavailable. Two older functions for this are .. cpp:function:: Compression_Algorithm* make_compressor(std::string type) .. cpp:function:: Decompression_Algorithm* make_decompressor(std::string type) which call the relevant ``create`` function and then ``release`` the returned ``unique_ptr``. Avoid these in new code. To use a compression algorithm in a `Pipe` use the adapter types `Compression_Filter` and `Decompression_Filter` from `comp_filter.h`. The constructors of both filters take a `std::string` argument (passed to `make_compressor` or `make_decompressor`), the compression filter also takes a `level` parameter. Finally both constructors have a parameter `buf_sz` which specifies the size of the internal buffer that will be used - inputs will be broken into blocks of this size. The default is 4096.