Estimated Reading Time: 1 hour
C++ Standard Library
The C++ standard library is a collection of data-structures and methods which must be provided with any standard-compliant implementation of C++. As a result, using the standard library is portable across different systems and compilers, and does not require downloading and linking additional libraries (which will be a topic we cover in a later week). It does, however, require the use of header files, as we’ll see in a moment.
The C++ language standard is always evolving, with the most recent version of the standard being C++23. Sometimes there may be requirements to work with a specific C++ standard (more often, a minimum standard), and so it can be important to check whether a feature that you want to use is available to you in the version of C++ that you will be using.
In this course the language features that we will make use of will be compatible with C++14 onwards, though we will use C++17 as our standard. This week we will go over some of the most commonly used components of the standard library, but this will only scratch the surface and becoming familiar with the breadth of the library takes time and practice.
Input / Output
Input and output (often abbreviated to IO or I/O) to the terminal or to a file is important for interacting with users and retrieving the results of programs. It can also be useful for some debugging purposes, although you should use a debugger where you can (more on that later in this course). The standard library has some useful headers for this; we’ll make heavy use of <iostream>
to write to, or receive input from, the terminal.
Output
#include <iostream>
int main()
{
std::cout << "Hello World!" << std::endl;
return 0;
}
This short program prints Hello World!
to the terminal.
-
We use
std::
to indicate that a function is part of thestd
(standard) namespace, which prevents named functions, classes, and variables in the library from clashing with our own. -
std::cout
is a stream: this is an output buffer than can take a variety of types including strings, integers, and floating point numbers, and turns them into output. -
std::endl
is a special marker for the line end in the output stream.
Input
#include <iostream>
#include <string>
int main()
{
std::string name;
std::cout << "What is your name?" << std::endl;
std::cin >> name;
std::cout << "Hello " << name << "!" << std::endl;
return 0;
}
cin
is an input buffer.- With
cin
the arrows go the other way, because the information passes from the input buffer into thename
variable. - There are other kinds of input buffers, like reading from files.
Containers
Containers are an important part of the C++ standard library; these allow us to keep collections of objects such as lists (vector
, array
), sets (set
), or maps of key-value pairs (map
, unordered_map
), among others. These are some of the most common classes that you will use in C++ programming, so it is a good idea to familiarise yourself with them. We’ll discuss vector
as an example here as it probably the most commonly used, but see the section “Using C++ Documentation” for more information on how to learn about the other kinds of containers.
At minimum, you should be comfortable using vector
, array
, and map
.
vector<T>
is used to hold list of elements of typeT
of dynamic size (whereT
is some arbitrary type).- It requires
#include <vector>
.
- It requires
array<T>
is used to hold a list of elements of typeT
of fixed size. The size must be determined at compile time.- It requires
#include <array>
- Since the size is known at compile time, it can provide a guarantee that array sizes are always compatible with what you expect during runtime. This can be useful for ensuring that lists of data match up in size, or that a vector has the same dimensions as the vector space it is supposed to be in.
- Arrays can be more efficient than vectors because you don’t need to do manual size checking, and the compiler may be able to make optimisations if the size is known at compile time.
- It requires
map<T1, T2>
is used to hold key-value pairs with keys of typeT1
and values of typeT2
. (This is known in some languages as a Dictionary.) Maps are dynamic in size; key-value pairs can be added and removed. You can check if a given key is present in amap
, and you can also iterate over all elements in a map and access both the key and the value.- It requires
#include <map>
.
- It requires
- The fact that all three of these types are iterable means that we can apply some of the same programming techniques to all of them, because we can loop over their elements.
Vector
The vector class is defined in the <vector>
header. It is used when you want to keep an ordered list of elements.
Declaring a Vector:
#include <vector>
int main()
{
std::vector<int> fibbonacciList = {1, 1, 2, 3, 5, 8, 13};
return 0;
}
- We must include
<vector>
to use the vector class. - When we declare a vector we also must place the type of object that the vector holds in angle brackets. So a vector of integers is declared
std::vector<int>
, and a vector of doubles is declaredstd::vector<double>
. - We can declare an empty list using
std::vector<int> myIntList
, or we can use the curly-brace notation above to give some initial values.
Adding or Removing Elements
Vectors are dynamically sized, which means we can add or remove elements. Here are some example of adding in and removing particular elements:
std::vector<int> v = {1,2,3};
v.push_back(4); // Add an element to the end of our vector
v.pop_back(); // Remove last element from vector
v.insert(v.end() - 1, 99); // Insert an element before the last element of the vector
v.erase(v.begin() + 1); // Remove the second element of a vector
v.end()
andv.begin()
return iterators, special types which can be used to iterate over containers. The methodsbegin()
andend()
return iterators to the start and end of a container respectively, and iterators can be incremented using regular arithmetic. These can also be used for looping e.g.for(vector<int>::iterator it = myNums.begin(); it != myNums.end(); it++)
. You can useauto
to infer the iterator type in a loop declaration to make it more succinct, e.g.for(auto it = myNums.begin(); it != myNums.end(); it++)
.- The memory for a
vector
is assigned in a contiguous block. This helps with performance because the vector class can find the memory location of a given element using pointer arithmetic. - A
vector
doesn’t have a fixed size, but it does have a certain amount of memory allocated to it. If you add enough elements to outgrow this size, it will have to allocate a new block of memory of a larger size, and copy the elements over to this new memory in order to increase the size and still remain contiguous.- The cost of this operation scales linearly with the size of your vector.
- You cannot necessarily predict when this will happen as the amount of memory allocated to a vector by default will be compiler dependent.
- If you know how much space you will need for a vector, or can place an appropriate limit on it, then you can use the
.reserve(int)
method to reserve space for for a given number of elements, e.g.v.reserve(20)
will allocate enough memory tov
to store 20 elements.
Accessing Elements
We can also access elements of a vector in a couple of different ways.
#include <vector>
int main()
{
std::vector<int> fibbonacciList = {1, 1, 2, 3, 5, 8, 13};
int element0 = fibbonacciList.at(0);
int element3 = fibbonacciList[3];
return 0;
}
- Indexing of vectors (and other array like containers) always starts at 0.
- We can use the
.at(i)
method on a vector to get the value at that position in the vector. Ifi
is outside of the range of the vector then this method will throw an error, which can be handled by your program. (See next week when we discuss exceptions.) - We can also access a vector using square brackets:
myVector[i]
. This is substantially faster than usingmyVector.at(i)
, but does no bounds checking. Ifi
is outside of the size of vector then the behaviour is undefined: this means the standard does not define what the program should do in this case, and the result will be compiler dependent. Undefined behaviour should always be avoided. Although[]
is the most common way to access a vector, it is important to only do so if you can be sure that you will not go outside of the bounds of the vector. - Accessing and modifying vector elements using
[]
has high performance which is competitive with traditional C-arrays (particularly with compiler optimisations turned on), which makesvector
an excellent general purpose class for lists. - Accessing a vector element allows us to change its value just like an ordinary variable e.g.
myVector[3] = 5
sets element3
ofmyVector
equal to5
.
Looping Over Vectors
One of the most important operations we perform on vectors (and other containers) is to loop over the elements of a vector. As mentioned above, we can do this using an iterator:
for(auto it = myVector.begin(); it != myVector.end(); it++)
{
std::cout << *it << std::endl;
*it += 5; // Adds 5 to the element of the vector
}
it
is an iterator which starts at the beginning of the vector and is incremented until it reaches the end of the vector. Iterating this way means that we know we’ll never be out of range for our vector!- Here we can use
auto
to make code more concise, but the type of an iterator for a vector isvector<type>::iterator
for whatevertype
you hold in your vector (e.g.vector<int>::iterator
). Iterators for other containers are defined similarly. - In order to get the value held from an iterator you need to use the dereference operator (*) like with a pointer. So we need to use
*it
in order to print out or reassign that value.
Iterators can be a little cumbersome to use, but there are luckily are other ways of doing the same thing.
We can use range based loops, since the vector object keeps track of the number of elements within it. We could do this e.g. to print out all the elements of a vector.
#include <vector>
int main()
{
std::vector<int> fibbonacciList = {1, 1, 2, 3, 5, 8, 13};
for(auto num : fibbonacciList)
{
std::cout << num << std::endl;
}
return 0;
}
- Declaring a range based loop we use the syntax:
for(type element : list){ loop code }
. - Here we have used the
auto
keyword to ask C++ to infer the type for us; this can be useful when we want to avoid writing out lots of types explicitly! - This will iterate through all the elements of the list, in order. For each loop iteration,
num
will be initialised with the data for that element in the list. num
is initialised the same way as function arguments, so by default the elements of the list are passed by value i.e. a copy of the element is made and stored that in the variablenum
. This means any changes tonum
in the loop code are not reflected in the vector itself.- If you want to avoid copy overheads or make changes to your vector, you should make this variable a reference using the
&
operator just like when passing by reference to a function. In this case we would writefor(auto &num : fibbonacciList)
and our loop code would receive a reference to each element of the list rather than a copy. - Using a reference to avoid copy overheads can leave your container vulnerable to changes from the code inside the loop!
- You can learn more about these issues and how to navigate them in the notes on passing by value and passing by reference.
- If you want to avoid copy overheads or make changes to your vector, you should make this variable a reference using the
- Range based loops can be used with other containers which are iterable, like
array
andmap
.
Traditional for
loops with integral types can also be useful for iterating through vectors when you need to keep track of the index of an element, for example when assigning a vector a value that depends on its index or when working with multiple vectors at once.
#include <vector>
int main()
{
std::vector<int> a = {1, 2, 3, 4};
std::vector<int> b = {5, 6, 7, 8};
std::vector<int> c(4); // create a vector with size 4
for(size_t i = 0; i < 4; i++)
{
c.at(i) = a.at(i) + b.at(i);
}
return 0;
}
size_t
is an unsigned (cannot be negative) integral numerical type which is often used for the sizes of objects.- The max value of
size_t
is large enough for the largest objects that can be stored in memory. - If you know that you won’t have too many elements you can use other integral types like
int
oruint
.
- The max value of
- In this case we have to manually make sure that we don’t go out of bounds for our vectors.
- You can use things like the size method (
myVector.size()
, which returns asize_t
) as the upper bound to prevent going outside of the range of the vector. - If you are using multiple vectors you should make sure that they are the same size, or that you don’t exceed the bounds of the smaller vector.
- It’s usually easier and safer to use range based loops (or iterators) if you don’t need explicit indexing information.
- You can use things like the size method (
Aside: Namespaces, using
, and typedef
You may have noticed that our code samples so far have had to use std::
very frequently! The std
namespace keeps standard names from clashing with our own function, variable, or class definitions, but can also cumbersome when certain types or functions are used very frequently. In some code you may also find nested namespaces and classes, leading to multiple ::
prefixes, which can make code difficult to read. There are two common remedies to this issue.
The using
Keyword
We can make all of a namespace available to use directly by using the following syntax:
using namespace std;
- Now all the names in
std
are available to use without usingstd::
in front, so we can just typecout
,endl
,cin
etc. - This is generally bad because
std
is very big and contains lots of names! The whole point of the namespace is to avoid accidental clashes of names.- It is a particularly bad idea to place such
using
statements inside header files, since any files which include this header will also end up using this namespace. Try to limit the scope ofusing
statements so that they make your code clearer, but don’t affect other parts of your code.
- It is a particularly bad idea to place such
A better alternative to using an entire namespace is to just use specific names from that namespace e.g.:
using std::cout;
using std::endl;
- Now
cout
andendl
are available to use without typingstd::
, but the rest of thestd
namespace will still require it. - Each name needs to be declared separately with another
using
keyword which can lead to a lot ofusing
statements.
The typedef
Keyword
Types in C++ can grow to be quite long and complicated for certain kinds of objects, especially when namespaces are also involved. Furthermore, we often want type names to be more indicative of what we actually want to use them for: the name IntMatrix
is more helpful than std::vector< std::vector <int>>
! We can assign an alias for any type by using the typedef
keyword like so:
typedef std::vector< std::vector<int> > = IntMatrix;
- We can now declare variables to be of type
IntMatrix
, or use it as a return type or parameter type in functions. - It is identical to
std::vector< std::vector<int>>
; the compiler will replace instances ofIntMatrix
with the real type when it comes to compilation. It is just to make the code easier to read and write.
Algorithm
The <algorithm>
library is an important part of modern C++ code. It contains a variety of commonly used algorithms, many of which operate on containers. These include sorting, searching, counting, maxima, minima, and merge. If there is some functionality that you need then it is usually worth checking if there is a standard library implementation for it already, before trying to implement it yourself or seeking third party libraries. This will save you time on implementation and testing, and keep your code portable by minimising external dependencies.
The functionality in algorithm
can be made much more flexible and interesting by combining with other functions, as many of the functions in algorithm
take functions as arguments. One way to do this is to write a function and pass a reference to it. Imagine that we want to count the number of even numbers in a vector of integers. Using count_if
from the <algorithm>
library we could write:
bool isEven(int x)
{
return x % 2 == 0;
}
int main()
{
vector<int> myNums = {1, 6, 5, 8, 3, 5, 4, 2, 8, 9, 9, 7, 6};
int numEvens = std::count_if(myNums.begin(), myNums.end(), &isEven)
std::cout << "num evens = " << numEvens << std::endl;
return 0;
}
- Note that here we give
count_if
iterators for the beginning and end of the vector. - The third argument given to
count_if
is a pointer to a function which takes an integerx
and returnstrue
ifx
is divisible by two.- You can also store a function pointer in a variable like so:
auto fnPtr = &isEven
. - The explicit type declaration of a function pointer looks like this:
bool (*fnPtr) (int) = &isEven
i.e. a function pointer (fnPtr
) can be dereferenced (*fnPtr
) to a function that takes anint
and return abool
. If you do need to declare a variable as a function pointer then it’s usually easier to useauto
! - The
()
operator for evaluating a function like object is also defined for function pointers, so you can evaluate this function pointer usingfnPtr(i)
or(*fnPtr)(i)
.
- You can also store a function pointer in a variable like so:
- In general the third argument can be anything which takes an argument of the correct type and returns a boolean.
- In this case we take an
int
because our vector is of typevector<int>
. - So if our third argument is called
F
, then for any integeri
we must be able to writeF(i)
and have it evaluate to abool
. - This kind of object is called a functor, which in C++ just means that the
()
operator is defined. (The term functor has very different meanings in other programming languages and areas of mathematics!)
- In this case we take an
count_if
applies the function to each element of the vector and adds to the count only if it evaluates to true.
Often the pieces of code that we want to pass to <algorithm>
functions like this are small and not re-used in other parts of our code. Separately declaring small functions like this can also make our code less readable because the implementation for the function is stored somewhere else compared to where it is being used (note we have to define isEven
outside of main
). Next we’ll look at a way of defining functions that leads to fewer function definitions cluttering up our code!
Anonymous Functions
The code above using count_if
works fine, but if we have many such algorithm calls, which employ different tests, we can end up with code that has a lot of function definitions that are only used once. We can express this more succinctly using anonymous functions, also known as lambda expressions. These are function objects which can be defined without the usual C++ function declaration, and passed around like ordinary objects (including being copied). Compare the following code with the sample above:
int main()
{
vector<int> myNums = {1, 6, 5, 8, 3, 5, 4, 2, 8, 9, 9, 7, 6};
int numEvens = std::count_if(myNums.begin(), myNums.end(), [](int x){return x%2 == 0;})
std::cout << "num evens = " << numEvens << std::endl;
return 0;
}
- Our function is now defined in-line, inside the arguments list for
count_if
, instead of being declared separately elsewhere. - We’ll explain the syntax below, but note the similarities between the expression
[](int x){return x%2 == 0;}
and our definition ofisEven
.
This code achieves the same thing as previous example using function pointers, but there is no named function passed to count_if
. Instead, we have an anonymous function - an object that we have created for the purposes of providing an argument to count_if
. We can also use lambda expressions to create named objects, like so:
auto isEven = [](int x){return x%2 == 0;};
- The object
isEven
can now be passed to functions, copied, overwritten, and so on. - If creating a named variable for a lambda expression, always use
auto
. Lambda functions cannot be typed unless the square brackets[]
are empty (in which case they can be typed in the same was as function pointers), so there is no explicit type that you can give when declaring them.
Lambda expressions make our coding style more flexible, because these functions can be defined anywhere that an ordinary variable can be (inside functions or loops, inside argument lists etc.) and can be passed around easily. We’ll also see below that the []
give us functionality that we don’t normally have access to with regular functions which makes lambda expressions very powerful.
Syntax for Anonymous Functions
The lambda expression syntax can look a little confusing at first, but becomes simpler if we understand what the three different kinds of brackets are for:
[]
Square brackets are for variables to be captured from the environment. These are variables which you want to be available in your function, but are not explicit function arguments. This list can include any variables which are currently in scope, and variables can be passed by value or by reference.- For example, you might want to apply a lambda function to each element of a vector in turn (in which case the vector elements are the function arguments) but also take into account another variable in the environment that will be the same for each function application. There is no analogue for this in ordinary function definitions, so it is not unusual for these brackets to remain empty! We will see in a moment how we can make use of variable capture to make our functions richer.
()
Round brackets are for function arguments. These are passed into the anonymous function just like arguments are passed to any other function, and can be passed by value or by reference.{}
Curly braces are used to contain the function execution code. This may refer to the variables passed as arguments or captured from the environment. It can involve multiple lines / statements, separated by semicolons (;
) just like normal code, although lambda expressions are usually used for short code fragments. If there is no return statement then the return type isvoid
, as usual.- The brackets are always in this order:
[] () {}
.
Declaring anonymous functions is therefore almost identical to regular functions except that we don’t need to give it a name, and we have a set of square brackets []
at the front!
If you want to write an explicit return type you can also use the slightly longer syntax: [...] (...) {...} -> returnType {...}
as in the following example for our isEven
function:
auto isEven = [](int x) -> bool {return x%2 == 0;};
- This can be useful for other programmers reading your code because it makes the return type clear.
- This can also help prevent bugs because if the return from the function body (in
{}
) doesn’t match the return type given it will give a compiler error.
We can see from our previous example the use of the ()
and {}
brackets to define our function argument (int x)
and our function body {return x%2 == 0;}
. So far our variable capture []
has been empty, so let’s modify our function to make use of this feature. Say we want to count the number of elements in my list, divisible by some number n
, which we won’t know ahead of time. We can write
int n = getSomeNumber(); // Don't know the number n at compile time
int numDivisibleN = std::count_if(myNums.begin(), myNums.end(), [n](int x){return x%n == 0;});
std::cout << "Number divisible by n = " << numDivisibleN << std::endl;
- We use variable capture
[n]
to get the divisorn
from the the environment and bind it to our function. Now when we call the function on each element (the argumentx
in round brackets) we divide each element byn
and get the remainder! - This kind of behaviour is hard to replicate using function references, so lambda expressions are extremely useful for this kind of code.
Using C++ documentation
You will often find when programming, especially in a language with such an expansive standard library, that there are things that you need to look up. There are a large number of classes and functions available to C++ programmers, many of which may be new to you or require refreshing at various points.
Two common sites for C++ reference are:
Both are extremely useful and give thorough information about how to use classes and methods according to the specification in the C++ standard. Take for example the pages on the vector
class: https://cplusplus.com/reference/vector/vector/ or https://en.cppreference.com/w/cpp/container/vector. Each tells us:
- which header the class is found in
- a summary of the class and how it is stored in memory
- information on performance of basic operations
- a list of member functions and other information about the class (with links to detailed information and code samples)
- a list of non member functions which operate on the class
You can also use these reference sites too look up libraries that make up part of the C++ standard. For example, https://cplusplus.com/reference/algorithm/ contains information about what is contained in the <algorithm>
library that we briefly discussed above, with links to more detailed information about the functions available and the container types on which they operate (https://cplusplus.com/reference/stl/).
You should familiarise yourself with these resources to make the best use of C++ in the future. When writing research code, try to get into the habit of checking whether the C++ standard, or common libraries, already implement the functionality that you need satisfactorily before trying to implement your own. Online searches and forums such as StackOverflow can be a useful resource for this as well, but remember to check the official specification if you do find a class or function that you think will be useful to you, to be sure that you have accurate information about it!