Types in C++
In this section we will cover some core ideas that you will need to understand to program effectively in C++ and similar languages. We will assume that you have some programming experience although not necessarily in C++, with the expectation that Python is the most commonly known language. As a result, a few things may need to be explained before proceeding with writing C++ code.
Type Systems
Type systems are an enormous topic, but we should understand a little bit about types in order to know how to program in different languages. Almost every high-level programming language is typed in some way.
- Types denote what kind of data a variable represents. All data in a computer is just
1
s and0
s, so we need to know how to interpret data in order to use it. Types can make this process easier, since we don’t have to manually remember what each memory location is supposed to be representing and enforce that it is treated appropriately.- Consider for example how to print some data to the screen. The same sequence of bits at a memory location has a very different meaning if it’s an integer or a series of characters.
- An important property in many languages is type safety. This is a property of programs that essentially tells us that we are never using data meant to represent one type in a place where data of another type is expected. For example, we are never passing a string to a function which is meant to manipulate integer. This eliminates a large class of bugs!
- Programming languages may be statically or dynamically typed.
- In a statically typed language the type of a variable is decided at the declaration and it cannot be changed. To change a variable from one type to another requires another variable to be declared and an appropriate conversion to be defined. This is how typing works in C++, C, Rust, and many others.
- In a dynamically typed language the type of a variable can change throughout the program run-time. (Remember that a variable is a handle, generally for some data which resides in memory.) Instead of applying a type to the variable, the data itself is tagged with a type. A variable can have its data changed, including its type, for example from an
int
to astring
. This is how typing works in a language like Python, Javascript, or Julia.
C++ Types
Types in C++ are static, and type correctness is checked at compile time. This means that if your program compiles correctly, you will not encounter type errors at runtime.*
Although dynamically typed programming languages like Python will prevent poorly typed statements from executing at runtime, there is no way to know whether your program contains type errors until you crash into one. Part of the problem is that functions can return different types depending the input or program state, meaning that you can’t necessarily be sure that the thing that you think is calculating an integer is definitely going to give you an integer every time. Variables may also have their type changed by side effects after being passed to a function in a dynamically typed language.
- These problems are not uncommon: conversion of a variable from an integer type to a floating point type under some circumstances is easy to do in Python. Floating point and integer types are interchangeable in most Python code but behave differently (integer arithmetic is exact while floating point is approximate, for example) and you may not discover the conversion has happened until you try to use the variable somewhere that a float cannot be used, such as indexing an array. Because the conversion is silent and valid in a dynamically typed language, it can be extremely hard to find where the conversion happened in a large program, as the problem could originate a long way away from where the type error gets raised!
Type systems can be leveraged to ensure many kinds of safety properties in programs because information can be built into custom types. Examples of this might be ensuring at compile time that matrices in matrix multiplications are compatible (recall that a $X \times Y$ matrix can be multiplied by a $Y \times Z$ matrix), or that physical dimensions are consistent (e.g. a velocity has units $\text{Length} \times \text{Time}^{-1}$, so $v = \frac{d}{t}$ is a valid expression but $v = \frac{d^2}{t^2}$ is not).
* Technically C++, like a number of other languages, has some features which are not type-safe. It is possible in C++ to subvert the type system by using some low level memory operations, but there is almost never a reason to do this so you’re unlikely to see this in practice and you shouldn’t do it in your own code. Attempting to do this kind of manipulation usually results in undefined behaviour, so you won’t necessarily even be able to predict what your program will do unless you know exactly how your compiler turns your source into machine code!
C++ Types and Declaring Variables
In C++ when declaring variables we do so by first declaring the type, then the name of the variable, and then its value. For example:
int x = 5;
- This declares a variable
x
of typeint
with value5
. - Some types have default initial values, which would mean that you don’t have to supply the value explicitly.
- Some types can be declared unitialised, which means that the memory for that variable is reserved but not initialised. It will contain whatever bits were already there! It’s a good idea to initialise variables explicitly.
Types in C++ can sometimes be verbose or complicated, and it is sometimes easier to read and write code which makes use of the auto
keyword. This keyword tells the C++ compiler to deduce the type for us. This is called type inference and was made a feature of C++ in C++11, so will be absent in older codes.
auto x = 5;
auto y = std::abs(-13.4);
auto
can usually deduce the type from the result of an expression by looking at the types of all the variables and functions within it.- For example here it interprets
5
as an integer and therefore deduces thatx
is an int. - It will deduce that
y
is adouble
, since-13.4
must be a floating point type (double
by default) andstd::abs
returns adouble
when given adouble
. - Be especially careful when values can be ambiguous. Here
5
is being assigned as an int, but5
is also a validfloat
ordouble
. If you want a specific type in cases like this you should always specify it explicitly.
- For example here it interprets
auto
doesn’t always work: the compiler must be able to deduce the type from contextual information.- You cannot declare an unitialised variable with
auto
e.g.auto z;
will lead to a compiler error as it won’t know what type is intended forz
, even ifz
is later assigned.
- You cannot declare an unitialised variable with
- You cannot use
auto
when declaring the return types or parameter types of functions, you must always declare these type explicitly.- It’s generally a good idea therefore to know what the types of variables in your code are, even if you choose to use the
auto
keyword! This will make writing your own functions, and knowing what functions you can pass your data to, much easier.
- It’s generally a good idea therefore to know what the types of variables in your code are, even if you choose to use the
- In an IDE like VSCode you can inspect the type of a variable by hovering the mouse over that variable. If you’ve used
auto
it will be able to tell you what type has been deduced for it. - Bear in mind that
auto
can make your code more concise, but can also make your code harder to understand. Sometimes it’s better to write your type explicitly so that people reading your code can immediately understand what the types of your variables are.
Defining Custom Types
Custom types are an important feature in typed languages in order to be able to represent and manipulate more complex data in a type-safe way. In C++ the most common way to define a new type is to declare a class
(or equivalently a struct
). Classes are a common feature of Object Oriented Programming, which is a popular approach to programming in C++. (Some examples of other languages with classes for object oriented programming are C#, Java, and Python.) We’ll discuss the design and use of classes in the next section, so for now let the following suffice:
- A class is a custom data type which is defined by the programmer. It can contain any number of variables and functions.
- Once it is defined it can be used like any other type, e.g. it can be accepted as an argument in, or returned from, a function. Type safety rules still apply.
- Classes give us a way of defining sub-types which are substitutable. For example we can define a
Shape
type, and then haveCircle
andSquare
sub-types which are accepted by the type system anywhere where aShape
type is accepted. This makes our type system more flexible and expressive. We discuss classes in detail in a later section of this week’s notes.
We will focus overwhelmingly on classes as our means of defining custom types, but for those who are interested there are two further ways of declaring custom types in C++:
enum
: This stands for enumeration. Anenum
is a type which can take one of a finite set of values (i.e. the values are enumerable). Each of these values must have a name, for example let’s say we want aColour
enum which can take the valuesred
,green
, andblue
. We can declare a newenum
calledColour
in two ways:enum Colour {red, green, blue};
. This kind of enum implicitly converts the valuesred
,green
, andblue
, to1
,2
, and3
respectively, and theColour
type can be used interchangeably withint
.- Because this type of
enum
is interchangeable withint
, it can be used to e.g. index an array. This can be useful when you want to efficiently store data based on categorisations. For example, say you have data about some population, split up by gender and age group. By turning your gender categories and age groups into enums, you can then store your data as a matrix which is indexed likedata[gender][age_group]
. - For this kind of enum we can just reference these values using the names
red
,green
, andblue
.
- Because this type of
enum class Colour {red, green, blue};
. This kind of enum (called anenum class
) cannot be used interchangeably withint
, and thereforeColour
can only be used in places that are explicitly expecting aColour
type. We usually want to use anenum class
so that we don’t accidentally mix it up with integer types!- This cannot be used to index arrays (because it is not an int), but it can be used as a key in
map
types.map
andunordered_map
provide C++ equivalents to Python’s dictionary type. - In order to use these values we have to also include the class name, so we have to write
Colour::red
,Colour::green
, orColour::blue
.
- This cannot be used to index arrays (because it is not an int), but it can be used as a key in
union
: Union types are types which represent a value which is one of a finite set of types. Aunion
is declared with a list of members of different types, for exampleunion IntOrString { int i; string s; };
can store anint
or astring
. When a variable of typeIntOrString
is declared, it is only allocated enough memory to store one of its members at a time, so it cannot store bothi
ands
at the same time. The programmer needs to manually keep track of which type is present, often using an auxiliary variable, in order to safely use union types. Given this additional difficulty, I wouldn’t recommend using union types without a very strong reason.
Microsoft has excellent, and accessible, resources on enum
and union
types if you are interested in learning more about them.
N.B. C++17 onwards also has a special class called std::variant
which is designed to replace union
types in a more type-safe way, because the variant
can be checked to see which type it is currently holding. (That said, checking which type the variant has is still rather clunky, and you have to check for each type manually so if there are many cases it can be easy to miss one and the compiler will not warn you!) Ultimately, union / variant types are not terribly common in C++ code in practice, although some languages (especially functional languages like ML and Haskell) handle these concepts much more naturally. If you’re interested in this kind of approach to types, I recommend reading up on algebraic datatypes.