Estimated Reading Time: 15 minutes
Pass by Value and Pass by Reference
Variables are handles for data, which needs to be stored in memory somewhere so that it can be read and modified.
In C++ we can find out the memory address of a given variable using the &
operator, as in the following code snippet:
#include <iostream>
int main()
{
int x = 15;
std::cout << "x is stored at address " << &x << " and has value " << x << std::endl;
x += 1; // Increment the value of x by 1.
std::cout << "x is stored at address " << &x << " and has value " << x << std::endl;
return 0;
}
- Declaring
int x = 15
tells the compiler that we will need to store some data which is to be interpreted as an integer with an initial value of 15. The compiler then knows:- How much memory to allocate for the variable (for an
int
this is normally 4 bytes). - How the variable can be used in the remainder of the program (see the notes on types for more information).
- To set the data at that memory location to be equal to
15
at this point in the program. - From now on, in this scope,
x
will refer to the value stored at this memory location, and changes tox
will lead to changes to the data stored there.
- How much memory to allocate for the variable (for an
&x
gives us the address in memory where the value forx
is stored. An address is a unique (usually 8 byte) numeric value for each memory location where we can store data. Changes to the value ofx
do not change its address: the address of a variable cannot be changed.- The compiler will select an unoccupied memory address to store the variable in for you.
The output of this program is
x is stored at address 0x7ffe13999ed4 and has value 15
x is stored at address 0x7ffe13999ed4 and has value 16
- The value of the address is given in hexadecimal, which is prefixed by
0x
when printed. - The address will be system dependent but note that the address does not change, but the value does.
Given this, we might wonder what it means to “pass a variable to a function”. Does the function need the value that the variable represents, or the location where the variable is stored?
There are broadly two approaches to giving functions access to variables from the calling scope:
- The function has a copy of the variable’s value in a new variable, independent from the original. This means read/write operations to this variable in the function operate on a separate memory location which does not affect the original. Once we return from the function scope, the original variable is unchanged because the function did not have access to the memory location where it is stored.
- The function has a reference to the variable, which is to say it is given the memory location where the variable is stored. Any read/write operations are made to this same location, and therefore when we leave the function scope any changes made to the variable inside the function will still remain.
Programming languages vary in their approaches to passing variables, but C++ gives us the choice when we define a function to pass each variable in either way: the first is called pass by value and the second is pass by reference. This choice significantly alters both the behaviour and the performance of a function in ways that we shall detail in the following sections.
Pass by Value
Pass by Value means that we copy the value of the variable we want to pass into the function, and the function works on this copy and leaves the original alone. Any changes that the function makes to the variable will not affect the value of that variable once you leave the function’s scope. In C++, passing by value is the default, so to pass by value you simply write the type and name of the variable in the function parameters in the usual way.
int add(int a, int b)
{
return (a + b);
}
This function can be safer, but is not time or memory efficient if variables are complex or large in size as the values need to be copied to new memory locations. You should only use pass by value for large pieces of data if you need an explicit copy made to work on and change locally in the function body but you cannot allow the function to change the original.
We can see this call by value in action explicitly in the following code example by checking the address and values of variables:
#include <iostream>
using namespace std;
int add(int a, int b)
{
cout << "In add function, before adding." << endl;
cout << "a is stored at address " << &a << " with value " << a << endl;
cout << "b is stored at address " << &b << " with value " << b << endl;
a = (a + b);
cout << "In add function, after adding." << endl;
cout << "a is stored at address " << &a << " with value " << a << endl;
cout << "b is stored at address " << &b << " with value " << b << endl;
return a;
}
int main()
{
int x = 12;
int y = 9;
cout << "Before add function." << endl;
cout << "x is stored at address " << &x << " with value " << x << endl;
cout << "y is stored at address " << &y << " with value " << y << endl;
int z = add(x, y);
cout << "After add function." << endl;
cout << "x is stored at address " << &x << " with value " << x << endl;
cout << "y is stored at address " << &y << " with value " << y << endl;
cout << "z is stored at address " << &z << " with value " << z << endl;
return 0;
}
Outputs:
Before add function.
x is stored at address 0x7ffee8b8b32c with value 12
y is stored at address 0x7ffee8b8b330 with value 9
In add function, before adding.
a is stored at address 0x7ffee8b8b30c with value 12
b is stored at address 0x7ffee8b8b308 with value 9
In add function, after adding.
a is stored at address 0x7ffee8b8b30c with value 21
b is stored at address 0x7ffee8b8b308 with value 9
After add function.
x is stored at address 0x7ffee8b8b32c with value 12
y is stored at address 0x7ffee8b8b330 with value 9
z is stored at address 0x7ffee8b8b334 with value 21
- Note that the variables
x
andy
are passed to theadd
function to serve as argumentsa
andb
respectively. - In the function we can see that
a
andb
are variables which are stored a separate memory locations fromx
andy
. When we modifya = a + b
the value stored fora
changes, butx
andy
do not because their memory locations were untouched.
Pass by Reference
Pass by Reference means that we tell the function where the original variable has been stored in memory, and we allow the function to work directly with that original variable. This has two major consequences:
- We only pass a memory address – usually 8 bytes in a 64-bit system – to the function, so there is no additional memory allocated to copy the object.
- The original variable can be changed by the function, and so any changes that happen within the function are retained after we leave the function’s scope.
We indicate that we want a reference to a variable using the &
symbol after the type of the argument in the function signature. The function below will take a reference to an integer and increment that integer by one. Because we have changed the value stored at that memory location, once we leave this function the variable that we passed to it will retain this increased value.
void increment(int &x)
{
x = x + 1;
}
- Even though we have passed
x
by reference,x
is just an ordinaryint
variable; this is only telling the compiler to give the function access to the original memory address instead of copying it to a new one. So inside the function we can just usex
as normal.- A reference just means an alias (i.e. a new name) for an existing variable, and therefore has the same type as the existing variable.
- It can be a bit confusing that the notation for a reference is the same as the address operator (
&
). These two uses of&
come up in different contexts though:- When it is used in a type context, i.e. follows a typename, it means a reference to a variable of that type.
- When it is used in an expression which evaluates to a value, e.g.
cout << &var << endl
, then it is the address operator and it evaluates to the memory address of the variable to which it is affixed.
Passing by reference can save significant time and memory by avoiding making needless copies of variables, but at the cost of making variables potentially vulnerable to being changed by a function. This can make it harder for someone using the function to reason about the program, and what the value of the variables they pass in will be once the function has finished.
We can once again illustrate this explicitly with addresses by re-using our example above, but changing the add
function to take a
and b
by reference (int &a
and int &b
):
#include <iostream>
using namespace std;
int add(int &a, int &b)
{
cout << "In add function, before adding." << endl;
cout << "a is stored at address " << &a << " with value " << a << endl;
cout << "b is stored at address " << &b << " with value " << b << endl;
a = (a + b);
cout << "In add function, after adding." << endl;
cout << "a is stored at address " << &a << " with value " << a << endl;
cout << "b is stored at address " << &b << " with value " << b << endl;
return a;
}
int main()
{
int x = 12;
int y = 9;
cout << "Before add function." << endl;
cout << "x is stored at address " << &x << " with value " << x << endl;
cout << "y is stored at address " << &y << " with value " << y << endl;
int z = add(x, y);
cout << "After add function." << endl;
cout << "x is stored at address " << &x << " with value " << x << endl;
cout << "y is stored at address " << &y << " with value " << y << endl;
cout << "z is stored at address " << &z << " with value " << z << endl;
return 0;
}
Yielding:
Before add function.
x is stored at address 0x7ffdd308656c with value 12
y is stored at address 0x7ffdd3086570 with value 9
In add function, before adding.
a is stored at address 0x7ffdd308656c with value 12
b is stored at address 0x7ffdd3086570 with value 9
In add function, after adding.
a is stored at address 0x7ffdd308656c with value 21
b is stored at address 0x7ffdd3086570 with value 9
After add function.
x is stored at address 0x7ffdd308656c with value 21
y is stored at address 0x7ffdd3086570 with value 9
z is stored at address 0x7ffdd3086574 with value 21
- Note that now
a
has the same address asx
, andb
has the same address asy
. - When
a
is updated, the value at its memory location is changed, so after theadd
function call we can see that the value ofx
has changed as well.
Using const
in Pass By Reference
We can retain the performance advantages of pass by reference and still protect our variables from changes by passing a const
reference.
void constRefExample(int const &x)
{
return 5*x;
}
The declaration int const &x
means that x
is a reference (&
) to a constant (const
) integer (int
). This means that the integer value cannot be changed within this function, and so any attempt to change the value of x
in the function will lead to a compiler error.
Try writing a function where you pass an argument by const reference and try to modify it inside the function. Take note of what the compiler error looks like!
Return Values
When we use a return
statement in a function, we are also passing by value, although a copy of the variable is not necessarily always made. As with inputs to a function, this can be a performance issue if large output data ends up being copied. There are however a few things to note about the efficiency of return
statements:
- Objects are copied using their copy constructor, a special function in their class definition which defines how to create a new object and copy the current object’s data. (In many cases this can be automatically created by the compiler.)
- Some objects also have a move constructor defined, in which the data is not explicitly copied, but a new object takes control of the data. We’ll return to this idea when we talk about pointers later in the course. (The move constructor may also be automatically created by the compiler.)
- Normally when an variable goes out of scope its memory is freed and can be reallocated to new variables. If we have a local variable in the function scope that we want to return, we can’t just give the address of the data (return by reference) because when the function returns the variable will go out of scope and that memory is freed.
- Although return types can be references, e.g.
int& someFunction()
, you have to be absolutely certain that the memory you are referencing will remain in scope. This could be e.g. a global variable, or a member of a class for an object which continues to exist. It should never be a variable created locally in that function scope. Don’t use reference return types unless you are really confident that you know what you are doing!
- Although return types can be references, e.g.
- For classes with a move constructor a local object can be returned without making a copy, since the compiler knows that the object is about to be destroyed as soon as the function returns, and can therefore have its data transferred instead. (This is why this optimisation can be used when returning a value but not when passing an object to a function by value: when passing an object to a function the original object will continue to exist.)
- The compiler will use a move constructor when available if the object is deemed large enough for the move to be more efficient than a copy, and a copy constructor when not. Therefore, you may find that returning values is more performant than you expect from the size of the data-structure.
We can see this move or copy behaviour for return values in the following code example:
#include <iostream>
using namespace std;
class Obj
{
public:
int a, b, c, d, e, f, g, h;
Obj(int a, int b, int c, int d, int e, int f, int g, int h) : a(a), b(b), c(c), d(d), e(e), f(f), g(g), h(h) {}
};
Obj makeObj()
{
Obj myObj(1, 2, 3, 4, 5, 6, 7, 8);
cout << "In makeObj, myObj is at address " << &myObj << endl;
cout << "myObj.a is at " << &myObj.a << endl;
cout << "myObj.b is at " << &myObj.b << endl;
return myObj;
}
int makeInt()
{
int x = 5;
cout << "In makeInt, x is at address " << &x << endl;
return x;
}
int main()
{
Obj newObj = makeObj();
cout << "Outside the function, newObj is at address " << &newObj << endl;
cout << "newObj.a is at " << &newObj.a << endl;
cout << "newObj.b is at " << &newObj.b << endl;
int y = makeInt();
cout << "Outside the function, y is at address " << &y << endl;
return 0;
}
Which yields the output:
In makeObj, myObj is at address 0x7ffdb4b6e0c0
myObj.a is at 0x7ffdb4b6e0c0
myObj.b is at 0x7ffdb4b6e0c4
Outside the function, newObj is at address 0x7ffdb4b6e0c0
newObj.a is at 0x7ffdb4b6e0c0
newObj.b is at 0x7ffdb4b6e0c4
In makeInt, x is at address 0x7ffdb4b6e094
Outside the function, y is at address 0x7ffdb4b6e0bc
Obj
is a large data type which contains 8int
values. (You’ll see how to define these custom data types, called classes, in section 2.)- In
Obj
there is no explicit copy or move constructor, these are implicitly filled in by the compiler for simple types like this. - Because
Obj
is large, themakeObj
function returns the object using move. We can see thatmyObj
andnewObj
have the same address. The control of this memory is moved frommyObj
tonewObj
, so whenmyObj
goes out of scope and is destroyed the memory remains active and under the control of the new variable. makeInt
is identical in structure, but only returns a singleint
. There’s no move defined forint
because it is already so small; we can see thatx
andy
have different addresses. Some small objects will also be copied instead of moved.
Note that not all types can be moved, and not all types can be copied. In these cases, we can use references arguments as outputs.
Mutable References as Outputs
If a return statement has significant overheads, it may be avoided using references. Let’s assume we have a large data class ImmovableData
with no move constructor.
ImmoveableData GenerateData(const int &a)
{
ImmovableData D;
for(int i = 0; i < a; i++)
{
// Do some data generation
...
}
return D;
}
int main()
{
ImmovableData data = GenerateData(100000);
return 0;
}
- This code will create a large data-structure the function call, and then copy that structure when the function returns and place the result in the variable
data
. The original data-structure is then deleted.
Instead of declaring a variable and setting it equal to the return value of a function, we can instead declare the variable in the calling scope, and then pass it into the function by reference.
void GenerateInPlace(const int &a, ImmovableData &v)
{
for(int i = 0; i < b; i++)
{
//Do some data generation
...
}
}
int main()
{
ImmovableData data;
makeListInPlace(100000, data);
return 0;
}
- In this case, only one data-structure is made. Its data is updated in the function, but it never has to be copied. Once we exit the function, the changes to
data
have persisted and we can use the values that we have assigned to it.
Which should I use?
- Passing small types like
int
orfloat
by value is fine, as they are the same size as a reference. - Passing by value is also fine if you need a copy of the argument to work on in the function body without affecting its value outside the function.
- Pass larger arguments (> 8 bytes) by
const
reference if you can. - Pass by (non const) reference if you need to work on a variable in place i.e. the function should change the value of the argument itself.
- Avoid
return
with large immovable data-structures for the same reason. These should be passed in and out by reference as function arguments.
Further things worth noting:
- If passing by reference, you can only pass literals (values like numbers and strings which are not assigned to a variable) if using a
const
reference. Consider two function signaturesrefAdd(int &a, int &b)
andconstRefAdd(const int &a, const int &b)
: we can callconstRefAdd(5, 12)
just fine, but if we callrefAdd(5, 12)
we will get an error. - Never use a
return
statement to return a reference (or a pointer) to a local variable e.g.return &x;
as the local variable will be destroyed when we leave the function scope. This will lead to a segmentation fault (memory error). - You can return by reference a variable which is not local to that function’s scope, for example a member function in a class may return a member variable of that class by reference, since when the function ends the object and its data will still exist. However, you must be sure that you will not keep the reference to the data for longer than the object’s lifetime; if the object passes out of scope and you continue to try to use the reference then you will have a memory error.