300 lines
14 KiB
ReStructuredText
300 lines
14 KiB
ReStructuredText
|
|
||
|
Notes for New Contributors
|
||
|
===================================
|
||
|
|
||
|
Source Code Layout
|
||
|
-------------------------------------------------
|
||
|
|
||
|
Under ``src`` there are directories
|
||
|
|
||
|
* ``lib`` is the library itself, more on that below
|
||
|
* ``cli`` is the command line application ``botan``
|
||
|
* ``tests`` contain what you would expect. Input files go under ``tests/data``.
|
||
|
* ``python/botan3.py`` is the Python ctypes wrapper
|
||
|
* ``bogo_shim`` contains the shim binary and configuration for
|
||
|
`BoringSSL's TLS test suite <https://github.com/google/boringssl/tree/master/ssl/test>`_
|
||
|
* ``fuzzer`` contains fuzz targets for various modules of the library
|
||
|
* ``build-data`` contains files read by the configure script. For
|
||
|
example ``build-data/cc/gcc.txt`` describes various gcc options.
|
||
|
* ``examples`` contains usage examples used in the documentation.
|
||
|
* ``scripts`` contains misc scripts: install, distribution, various
|
||
|
codegen things. Scripts controlling CI go under ``scripts/ci``.
|
||
|
* ``configs`` contains configuration files tools like pylint
|
||
|
* ``editors`` contains configuration files for editors like vscode and emacs
|
||
|
|
||
|
Under ``doc`` one finds the sources of this documentation
|
||
|
|
||
|
Library Layout
|
||
|
----------------------------------------
|
||
|
|
||
|
Under ``src/lib`` are several directories
|
||
|
|
||
|
* ``asn1`` is the DER encoder/decoder
|
||
|
* ``base`` defines some high level types
|
||
|
* ``block`` contains the block cipher implementations
|
||
|
* ``codec`` has hex, base64, base32, base58
|
||
|
* ``compat`` a (partial) compatibility layer for the libsodium API
|
||
|
* ``compression`` has the compression wrappers (zlib, bzip2, lzma)
|
||
|
* ``entropy`` has various entropy sources used by some of the RNGs
|
||
|
* ``ffi`` is the C99 API
|
||
|
* ``filters`` is a filter/pipe API for data transforms
|
||
|
* ``hash`` contains the hash function implementations
|
||
|
* ``kdf`` contains the key derivation functions
|
||
|
* ``mac`` contains the message authentication codes
|
||
|
* ``math`` is the big integer math library. It is divided into three parts:
|
||
|
``mp`` which are the low level algorithms; ``bigint`` which is a C++ wrapper
|
||
|
around ``mp``, and ``numbertheory`` which contains higher level algorithms like
|
||
|
primality testing and exponentiation
|
||
|
* ``misc`` contains odds and ends: format preserving encryption, SRP, threshold
|
||
|
secret sharing, all or nothing transform, and others
|
||
|
* ``modes`` contains block cipher modes (CBC, GCM, etc)
|
||
|
* ``passhash`` contains password hashing algorithms for authentication
|
||
|
* ``pbkdf`` contains password hashing algorithms for key derivation
|
||
|
* ``pk_pad`` contains padding schemes for public key algorithms
|
||
|
* ``prov`` contains bindings to external libraries such as PKCS #11
|
||
|
* ``psk_db`` contains a generic interface for a Pre-Shared-Key database
|
||
|
* ``pubkey`` contains the public key algorithms
|
||
|
* ``rng`` contains the random number generators
|
||
|
* ``stream`` contains the stream ciphers
|
||
|
* ``tls`` contains the TLS implementation
|
||
|
* ``utils`` contains various utility functions and types
|
||
|
* ``x509`` is X.509 certificates, PKCS #10 requests, OCSP
|
||
|
|
||
|
Each of these folders can contain subfolders which are treated as modules if they
|
||
|
contain an ``info.txt`` file. These submodules have an implicit dependency on their
|
||
|
parent module. The chapter :ref:`configure_script` contains more information on
|
||
|
Botan's module architecture.
|
||
|
|
||
|
Sending patches
|
||
|
----------------------------------------
|
||
|
|
||
|
All contributions should be submitted as pull requests via GitHub
|
||
|
(https://github.com/randombit/botan). If you are planning a large
|
||
|
change, open a discussion ticket on github before starting out to make
|
||
|
sure you are on the right path. And once you have something written,
|
||
|
even if it is not complete/ready to go, feel free to open a draft PR
|
||
|
for early review and comment.
|
||
|
|
||
|
If possible please sign your git commits using a PGP key.
|
||
|
See https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work for
|
||
|
instructions on how to set this up.
|
||
|
|
||
|
Depending on what your change is, your PR should probably also include an update
|
||
|
to ``news.rst`` with a note explaining the change. If your change is a
|
||
|
simple bug fix, a one sentence description is perhaps sufficient. If there is an
|
||
|
existing ticket on GitHub with discussion or other information, reference it in
|
||
|
your change note as 'GH #000'.
|
||
|
|
||
|
Update ``doc/credits.txt`` with your information so people know what you did!
|
||
|
|
||
|
If you are interested in contributing but don't know where to start check out
|
||
|
``doc/dev_ref/todo.rst`` for some ideas - these are changes we would almost
|
||
|
certainly accept once they've passed code review.
|
||
|
|
||
|
Also, try building and testing it on whatever hardware you have handy,
|
||
|
especially unusual platforms, or using C++ compilers other than the regularly
|
||
|
tested GCC, Clang, and Visual Studio.
|
||
|
|
||
|
FFI Additions
|
||
|
----------------
|
||
|
|
||
|
If adding a new function declaration to ``ffi.h``, the same PR must also add the
|
||
|
same declaration in the Python binding ``botan3.py``, in addition the new API
|
||
|
functionality must be exposed to Python and a test written in Python.
|
||
|
|
||
|
Git Usage
|
||
|
----------------------------------------
|
||
|
|
||
|
Do *NOT* merge ``master`` into your topic branch, this creates needless commits
|
||
|
and noise in history. Instead, as needed, rebase your branch against master
|
||
|
(``git rebase -i master``) and force push the branch to update the PR. If the
|
||
|
GitHub PR page does not report any merge conflicts and nobody asks you to
|
||
|
rebase, you don't need to rebase.
|
||
|
|
||
|
Try to keep your history clean and use rebase to squash your commits as
|
||
|
needed. If your diff is less than roughly 100 lines, it should probably be a
|
||
|
single commit. Only split commits as needed to help with review/understanding of
|
||
|
the change.
|
||
|
|
||
|
Python
|
||
|
----------------------------------------
|
||
|
|
||
|
Scripts should be in Python 3 whenever possible.
|
||
|
|
||
|
For configure.py (and helper scripts install.py, cleanup.py and build_docs.py)
|
||
|
the target is stock (no modules outside the standard library) CPython 3.x.
|
||
|
Support for PyPy, etc is great when viable (in the sense of not causing problems
|
||
|
for 3.x, and not requiring huge blocks of version dependent code). As running
|
||
|
this program successfully is required for a working build, making it as portable
|
||
|
as possible is considered key.
|
||
|
|
||
|
The python wrapper botan3.py targets CPython 3.x, and latest PyPy. Note that
|
||
|
a single file is used to avoid dealing with any of Python's various crazy module
|
||
|
distribution issues.
|
||
|
|
||
|
For random scripts not typically run by an end-user (codegen, visualization, and
|
||
|
so on) there isn't any need to worry about platform independence. Here it's fine
|
||
|
to depend on any useful modules such as graphviz or matplotlib, regardless if it
|
||
|
is available from a stock CPython install.
|
||
|
|
||
|
Build Tools and Hints
|
||
|
----------------------------------------
|
||
|
|
||
|
If you don't already use it for all your C/C++ development, install ``ccache``
|
||
|
(or on Windows, ``sccache``) right now, and configure a large cache on a fast
|
||
|
disk. It allows for very quick rebuilds by caching the compiler output.
|
||
|
|
||
|
Use ``--enable-sanitizers=`` flag to enable various sanitizer checks. Supported
|
||
|
values including "address" and "undefined" for GCC and Clang. GCC also supports
|
||
|
"iterator" (checked iterators), and Clang supports "memory" (MSan) and
|
||
|
"coverage" (for fuzzing).
|
||
|
|
||
|
On Linux if you have the ``lcov`` and ``gcov`` tools installed, then running
|
||
|
``./src/scripts/ci_build.py coverage`` will produce a coverage enabled build,
|
||
|
run the tests, test the fuzzers against a corpus, and produce an HTML report
|
||
|
of total coverage. This coverage build requires the development headers for
|
||
|
zlib, bzip2, liblzma, TrouSerS (libtspi), and Sqlite3.
|
||
|
|
||
|
Copyright Notice
|
||
|
----------------------------------------
|
||
|
|
||
|
At the top of any new file add a comment with a copyright and a reference to the
|
||
|
license, for example::
|
||
|
|
||
|
/*
|
||
|
* (C) 202x <You>
|
||
|
*
|
||
|
* Botan is released under the Simplified BSD License (see license.txt)
|
||
|
*/
|
||
|
|
||
|
If you are making a substantial or non-trivial change to an existing file, add
|
||
|
or update your own copyright statement at the top of each file.
|
||
|
|
||
|
Style Conventions
|
||
|
----------------------------------------
|
||
|
|
||
|
When writing your code remember the need for it to be easily understood by
|
||
|
reviewers and auditors, both at the time of the patch submission and in the
|
||
|
future.
|
||
|
|
||
|
Avoid complicated template metaprogramming where possible. It has its places but
|
||
|
should be used judiciously.
|
||
|
|
||
|
When designing a new API (for use either by library users or just internally)
|
||
|
try writing out the calling code first. That is, write out some code calling
|
||
|
your idealized API, then just implement that API. This can often help avoid
|
||
|
cut-and-paste by creating the correct abstractions needed to solve the problem
|
||
|
at hand.
|
||
|
|
||
|
The C++11 ``auto`` keyword is very convenient but only use it when the type
|
||
|
truly is obvious (considering also the potential for unexpected integer
|
||
|
conversions and the like, such as an apparent uint8_t being promoted to an int).
|
||
|
|
||
|
Unless there is a specific reason otherwise (eg due to calling some C API which
|
||
|
requires exactly a ``long*`` be provided) integer types should be either
|
||
|
``(u)intXX_t`` or ``size_t``. If the variable is used for integer values of "no
|
||
|
particular size", as in the loop ``for(some_type i = 0; i != 100; ++i)`` then
|
||
|
the type should be ``size_t``. Use one of the specific size integer types only
|
||
|
when there is a algorithmic/protocol reason to use an integer of that size. For
|
||
|
example if a parsing a protocol that uses 16-bit integer fields to encode a
|
||
|
length, naturally one would use ``uint16_t`` there.
|
||
|
|
||
|
If a variable is defined and not modified, declare it ``const``. Some exception
|
||
|
for very short-lived variables, but generally speaking being able to read the
|
||
|
declaration and know it will not be modified is useful.
|
||
|
|
||
|
Use ``override`` annotations whenever overriding a virtual function. If
|
||
|
introducing a new type that is not intended for further derivation, mark it ``final``.
|
||
|
|
||
|
Avoid explicit ``new`` or (especially) explicit ``delete``: use RAII,
|
||
|
``make_unique``, etc.
|
||
|
|
||
|
Use ``m_`` prefix on all member variables.
|
||
|
|
||
|
``clang-format`` is used for all C++ formatting. The configuration is
|
||
|
in ``.clang-format`` in the root directory. You can rerun the
|
||
|
formatter using ``make fmt`` or by invoking the script
|
||
|
``src/scripts/dev_tools/run_clang_format.py``. If the output would be
|
||
|
truly horrible, it is allowed to disable formatting for a specific
|
||
|
area using ``// clang-format off`` annotations.
|
||
|
|
||
|
.. note::
|
||
|
|
||
|
Since the output of clang-format varies from version to version, we
|
||
|
currently require using exactly ``clang-format 15``.
|
||
|
|
||
|
Use braces on both sides of if/else blocks, even if only using a single
|
||
|
statement.
|
||
|
|
||
|
Avoid ``using namespace`` declarations, even inside of single functions. One
|
||
|
allowed exception is ``using namespace std::placeholders`` in functions which
|
||
|
use ``std::bind``. (But, don't use ``std::bind`` - use a lambda instead).
|
||
|
|
||
|
Use ``::`` to explicitly refer to the global namespace (eg, when calling an OS
|
||
|
or external library function like ``::select`` or ``::sqlite3_open``).
|
||
|
|
||
|
Use of External Dependencies
|
||
|
----------------------------------------
|
||
|
|
||
|
Compiler Dependencies
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
|
||
|
The library should always be as functional as possible when compiled with just
|
||
|
Standard C++20. However, feel free to use the full language.
|
||
|
|
||
|
Use of compiler extensions is fine whenever appropriate; this is typically
|
||
|
restricted to a single file or an internal header. Compiler extensions used
|
||
|
currently include native uint128_t, SIMD intrinsics, inline asm syntax and so
|
||
|
on, so there are some existing examples of appropriate use.
|
||
|
|
||
|
Generally intrinsics or inline asm is preferred over bare assembly to avoid
|
||
|
calling convention issues among different platforms; the improvement in
|
||
|
maintainability is seen as worth any potential performance tradeoff. One risk
|
||
|
with intrinsics is that the compiler might rewrite your clever const-time SIMD
|
||
|
into something with a conditional jump, but code intended to be const-time
|
||
|
should in any case be annotated (using ``CT::poison``) so it can be checked at
|
||
|
runtime with tools.
|
||
|
|
||
|
Operating System Dependencies
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
|
||
|
If you're adding a small OS dependency in some larger piece of code, try to
|
||
|
contain the actual non-portable operations to utils/os_utils.* and then call
|
||
|
them from there.
|
||
|
|
||
|
As a policy, operating systems which are not supported by their original vendor
|
||
|
are not supported by Botan either. Patches that complicate the code in order to
|
||
|
support obsolete operating systems will likely be rejected. In writing OS
|
||
|
specific code, feel free to assume roughly POSIX 2008, or for Windows, Windows 8
|
||
|
/Server 2012 (which are as of this writing the oldest versions still supported
|
||
|
by Microsoft).
|
||
|
|
||
|
Some operating systems, such as OpenBSD, only support the latest release. For
|
||
|
such cases, it's acceptable to add code that requires APIs added in the most
|
||
|
recent release of that OS as soon as the release is available.
|
||
|
|
||
|
Library Dependencies
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
|
||
|
Any external library dependency - even optional ones - is met with as one PR
|
||
|
submitter put it "great skepticism".
|
||
|
|
||
|
At every API boundary there is potential for confusion that does not exist when
|
||
|
the call stack is all contained within the boundary. So the additional API
|
||
|
really needs to pull its weight. For example a simple text parser or such which
|
||
|
can be trivially implemented is not really for consideration. As a rough idea of
|
||
|
the bar, equate the viewed cost of an external dependency as at least 1000
|
||
|
additional lines of code in the library. That is, if the library really does
|
||
|
need this functionality, and it can be done in the library for less than that,
|
||
|
then it makes sense to just write the code. Yup.
|
||
|
|
||
|
Currently the (optional) external dependencies of the library are several
|
||
|
compression libraries (zlib, bzip2, lzma), sqlite3 database, Trousers (TPM
|
||
|
integration), plus various operating system utilities like basic filesystem
|
||
|
operations. These provide major pieces of functionality which seem worth the
|
||
|
trouble of maintaining an integration with.
|
||
|
|
||
|
At this point the most plausible examples of an appropriate new external
|
||
|
dependency are all deeper integrations with system level cryptographic
|
||
|
interfaces (CommonCrypto, CryptoAPI, /dev/crypto, iOS keychain, TPM 2.0, etc)
|