XClose

COMP0210: Research Computing with C++

Home
Menu

Choosing libraries

Choosing the right library for the job can be tricky. The right choice can supercharge your development, making it easy to write what you want. However, the wrong choice can set you back a significant amount of time and effort if you end up needing to change a library part-way into a project.

Fundamentally there are two questions to ask that will decide if you can use a given library in your project, and a few more that will help you decide if you should use the library.

Can I use the library?

  • Does it provide what I need?
  • Can I legally use it?

Should I use the library?

  • Is the library stable?
  • Is the library fast enough for my needs?
  • Is the library consistently updated?
  • Who develops the library?
  • Is the library well-tested?
  • Is the library high-quality?

For the latter questions, it’s not essential that you answer “yes” to every one, but you should have a good reason to still use the library after saying “no” to any. Choosing a good library can be a tricky business so let’s dive into each of these questions to understand how we can choose the best library for our needs.

Features: Does it provide what I need?

This one should be easy but sometimes isn’t. Take a look at documentation, examples, tutorials, read what other developers have said about the library, and maybe take a look at the code itself. You should have an idea what kind of features the library has and how it can help you.

Software licenses: Can I legally use it?

CAVEAT: This is not legal advice. If in doubt, seek your own legal advice (e.g., UCL Copyright advice).

A software license is a way to grant permissions for use and/or distribution to others. Putting your code on a public website does not grant permissions to anyone to use, copy, translate or distribute it as you still own the copyright of that work (even when it’s not stated that you own the copyright).

Contrary to popular belief, distributed unlicensed software (not in the public domain) is fully copyright protected, and therefore legally unusable (as no usage rights at all are granted by a license) until it passes into public domain after the copyright term has expired. See Wikipedia for more.

Remember: even if you aren’t distributing code yet, you need to understand the licenses of your dependencies.

Third party licenses

When you distribute your code, the licenses of any libraries you use takes effect. For example, a library with license:

  • MIT or BSD are permissive. So you can do what you want with the resulting software you write, including sell it on.
  • Apache handles multiple contributors and patent rights, but is basically permissive.

Some libraries can affect how you yourself must license your code:

  • GPL requires you to open-source your code, including changes to the library you imported, and your work is considered a “derivative work”, so must be shared.
  • LGPL for libraries, but use dynamic not static linkage. If you use static linking its basically as per GPL.

However, there’s still some debate on GPL/LGPL and derivative works. The only true test is in court.

Choosing a license

When you plan to distribute code:

  • Don’t write your own license, unless you use legal advice and you understand its consequences.
  • Check GitHub’s advice, and OSI for choosing your own license.
  • Try to pick one of the standard ones for compatibility.

For an in-depth understanding we recommend you read some works about licenses:

Note: Once a 3rd party has your code under a license agreement, their restrictions are determined by that version of the code.

Stability: Is a library stable?

Some libraries are so new their public API or interface is still subject to change. This is usually signalled by the project being in alpha or beta stages, either before an initial 1.0.0 release, or before a new major x.0.0 release. Some projects (like Python itself) ensure that all minor versions will not intentionally introduce breaking changes (i.e. you can use the same code moving from 3.10 to 3.11) but keep breaking changes to new major versions (i.e. moving from Python 2 to Python 3). If you haven’t come across this idea, read about semantic versioning.

When choosing a library to use with your own project, try to use a stable version, i.e. one with a stable interface.

Efficiency: Is the library fast enough?

Good scientific libraries tend to be well-optimised, that is their algorithms and data structures have been designed in an attempt to get the maximise performance for the functionality that they want to provide. For performance-critical libraries (like many used in numerical computations) the library developers should include some details about the performance of the library in its documentation. This is where you should ideally look for information about the performance. Otherwise, try to find comparisons with other, similar libraries to understand the performance. Try to understand your own needs when looking at performance: optimisation can mean trade-offs between being fast, being memory efficient, or being flexible. (Recall that our highly flexible std::function integrator was slower than our more narrow function pointer version.)

Many popular libraries have been researched, developed, and maintained over many years, with intense focus placed on the correctness and performance of their algorithms. As such, you are unlikely to beat a mature library’s performance with a like-for-like custom algorithm, but sometimes custom code can be faster due to a tradeoff between flexibility and performance. (Libraries, in order to be useful to large numbers of people, often provide fairly general methods, which can sometimes be improved upon using detailed knowledge of your precise problem.) If you have already used the library but think you might be able to beat a library’s performance:

  1. test the performance of the library’s implementation
  2. write some unit tests using the library’s implementation
  3. write your custom implementation
  4. modify the unit tests to test your implementation
  5. test the performance of your custom code and compare to the library performance

By testing correctness and performance on both the library and your custom code, you can understand whether it’s worth it to commit to either.

Up-to-date: Is the library regularly updated?

Libraries that are not regularly maintained can “rot”, that is:

  • bugs don’t get fixed
  • bugs in dependencies don’t get fixed
  • new language features break the library
  • newer, safer language features don’t get introduced
  • advances in packaging make it more difficult to install

In general, we want to avoid these issues, so consider these questions when deciding if the library is suitably up-to-date:

  • When was the last release?
  • Is there a sensible versioning scheme (e.g. semantic versioning)?
  • Is a changelog provided with each new release?
  • Is there a suitable release schedule?
  • Is the code developed in the open (e.g., on GitHub)?
    • How often are there commits?

You should develop your own intuition for what you consider “suitably up-to-date” but here are some heuristics:

  • If a library has been updated within the last year, it’s probably good.
  • If a library is very small, it probably doesn’t need many updates, so longer releases are fine.
  • If a library is very old and hasn’t been introducing new features (like some numerical libraries) and has been very well-used, there may simply be fewer bugs left to deal with, so a release over ten years ago is still probably okay (but might not be as well optimised on recent hardware).

Ownership: Who develops the library?

Libraries must be developed by someone: if there is no community or company responsible for a library’s development, it is considered abandoned and should probably be avoided. Consider some of the following questions:

  • Is the library obviously developed by a person, community, company or other organisation?
  • If a company:
    • How easy is it to report bugs?
    • Is it still open source?
    • What happens if the company decides to make the library closed-source?
  • If a person or community:
    • Is the library popular?
    • Are there many contributors to the project?
    • Are issues dealt with sensibly?
    • Is it easy to reach out to the developers?

Correctness: Is the library well-tested?

  • Are there many unit tests, do they run, do they pass?
    • Are they run automatically? Continuous integration is a good practice in which code is built and tested automatically when it is updated; this can be automated through e.g. github.
  • Does the library depend on other libraries?
  • Are the build tools common?

Quality: Is the library of high quality?

Beyond the things we’ve already discussed, there are a few more minor points that signal whether a library is developed well. The lack of any of these things doesn’t mean a library is bad, it may just be more difficult to use or update. Consider:

  • Documentation: does it exist? is it good?
  • Number of ToDos: do they keep a track of bugs to fix and future features to implement?
  • Dependencies: does it offer a clear list of dependencies? Are they trusted?
  • Data Structures: is it clear how to import/export data or images to use later? Do you understand what you need to put in and what you will get out when you use the library’s functions? Do you know which functions have side effects (e.g. in place updates?) and what they are?
  • Clear API: can you write a convenient wrapper? Is it clear how to use the library’s features?

Libraries you should be using

While you should be asking yourself the above questions to understand how a library can help you, there are some groups of libraries you should consider first:

  • Standard library
    • very well-tested
    • very well-documented
    • very well-used
    • constantly developed
    • all compiler vendors are required to provide it so there are no dependencies to install!
  • Vendor-provided libraries - provided by Intel/Nvidia/AMD/etc
    • well-tested
    • (often) well-documented
    • usually best performance for a particular architecture
    • but not always open-source
  • Well-known libraries - Boost, FFTW, Eigen, Vulkan, etc
    • well-tested
    • well-used
    • (often) well-documented
    • typically well optimised
    • strong communities
    • you may find that discussion with your particular research communities will help lead you to appropriate library choices