C++ Object Lifecycle
“I need to see,
the truth other men cannot see,
to be things that others can't be!
Give me courage to go where no angel will go!
And I will go!
I need to know!”
— Jekyll & Hyde
Prologue
As a C++ developer, or have some experience in C++, did you ever stop coding, staring at the monitor, deciding which parameter to use. To be specific, is pass by value, const reference or r-value.
1 | void Func(Object obj); |
I know why you stop at this, because you worries about the constructor overhead. That’s good, but don’t waste time on this every time.
So in this article, I’ll break down the constructor/destructor call for each of these types of parameters, thus you can confidently choose the best combination for your code.
Instrument Utilities
Subject class
Before we start, we have to think of a way to visualize the object’s lifecycle. This is simple, just put an output statement in each of these functions. So here is our Object
class.
1 | class Object |
We also add a _data
field for parameterized constructor. And to mimic real move constructor and operator, we use -1
to represent a “moved” object, which shouldn’t be used anymore.
Scope indicator
The basic knowledge you should know is that in C++, object lives in the current scope, and is guaranteed to be destroyed reaches the end of the scope. So we can use this as a trick to display the span of a scope.
I first learnt this trick from Scott Mayer’s Effective C++, where he mentioned a way to measure the execution time of a scope.
So here it is, I call it Fence
. To tell scope and function call apart, I used a scope
flag, thus it may look a little bit verbose.
1 | class Fence |
Then, to simplify the use of this class, we can use macro to wrap this. To indicate scope, we can use the following macro.
1 |
|
To indicate function, we could also use a compiler macro to avoid writing function name twice. And since we record the complete function, so we don’t need to introduce a nested scope.
1 |
|
Beside, here is another macro to add blank line to the output.
1 |
Lifecycle Breakdown
I write no comment in code, so that you can think about the result first. 🤔 And all experiments are done with MSVC in Visual Studio 2022. Both Debug and Release profiles output the same.
Object creation
First, let’s see how C++ handle object creation. We create these objects to see all possible constructor and operators.
1 | BEGIN_SCOPE("Create"); |
Running this, we’ll have the following output.
1 | ========== Create |
It’s a bit long, so I’ll break it down one by one.
First, no doubt that Object obj(10);
calls the parameterized constructor. Then, for Object obj2 = Object(15);
, =
will also invoke the constructor instead of the operator because it is considered to be variable definition. Following Object obj3(obj2);
and Object obj4(std::move(obj3));
obviously calls copy and move constructor.
Constructor can be taken as a special function call, and soon I’ll talk about parameter passing in functions.
Then, for regular assignment statement, we’ll call the corresponding operator. And here comes the overhead. If you assign a temporary value to a variable, e.g. obj = Object(25)
, an extra object will be created with move operator invoked. At last, variable or moved one to variable only invokes copy and move operator as we expected.
So for this part, we can conclude that, only a temporary value assignment will cause a little overhead. Although temporary value in initialization can be optimized, compiler doesn’t seem to care about that in assignment. I think that’s what temporary value meant to be. However, it invokes the move assignment, so it has little impact if you have a good “move”.
Return Value Optimization
There is a special case in object creation, which is called return value optimization (RVO). It eliminates the redundant copy for named or temporary objects when it plays as the return value. And that means, it will create the object directly at the caller’s scope. For example, we have the following function that returns an object.
1 | Object CreateA() |
And we can write the test.
1 | BEGIN_SCOPE("Return Value Optimization"); |
The output is as follows.
1 | ========== Return Value Optimization |
The best situation for RVO is when the returned object is unique, which is our CreateA
here. We can see that there is literally no extra constructor invoked for Object obj = CreateA();
, and obj = CreateA();
only calls the move operator. This is what we wish for.
However, our program may get more complex. For CreateB
, there are two choices for return, so we cannot perfectly apply RVO, but we can still optimize it to use a move constructor only to initialize it in the caller. However, we can make it better by placing obj
and obj2
in their corresponding if
-else
scope, so that perfect RVO can be applied.
One thing to notice is that, when RVO applied, the object is constructed in callee, and destructed in caller.
The condition for perfect RVO is that (I guess), the declaration of each return value does not dominate return statements that return other value. So that there will be no conflict in deciding their location in caller’s scope.
What is dominate then? To put it simple, if A dominates B, then every execution path to B must pass A first.
If a meticulous reader, you are, then you may ask, why use this verbose if
-else
instead of a ternary operator? Good question. We can test that.
1 | Object CreateB(int option) |
Surprisingly (or not if you just know that), it results in a copy constructor instead of move!
1 | >>>>> CreateB |
Why? Because compiler is not that aggressive. RVO only applies for one return value, but ternary operator makes it an expression (to be more specific, a l-value). So even if they are semantically equivalent, compiler will take the conservative choice to use copy instead of move.
Parameter passing
Then, let’s see how C++ prepare the function parameters. To better understand this, you may need a quick look at the stack frame in C/C++.
The arguments are placed on top of caller’s stack, so that callee can find it without knowing caller’s stack layout. And all that that implies, when passing arguments, we are initializing them in the out going args segment. So the arguments are actually in caller’s scope.
Pass by value
To see it in action, let’s define a simple function with a value parameter and test it.
1 | void Pass(Object object) |
The output will be as follows.
1 | ========== Pass by value |
First Object obj;
will call default constructor to initialize the object in local variable segment. When calling Pass(obj)
, it will first copy the object to the out going args segment which invokes copy constructor. Similarly, we can use std::move
to invoke move constructor instead. And finally, we can pass a temporary object, which will construct the argument on site, with no extra copy or move.
Are we getting it? It is the same as what we talked about the object initialization in Object creation.
So we can conclude that, pass by value may not be a good choice if we pass large objects often, and that’s why modern IDE suggests you use const reference instead.
Pass by (const) reference
As we know, reference is a grammar sugar for pointers. Passing a reference is actually passing a pointer, so there is literally no overhead. Which is why we like it.
1 | void PassCopy(const Object& object) |
The output is as follows.
1 | ========== Pass Copy |
We can see that, when using reference, no extra copy or move is needed, only the reference is passed. Which is why we prefer to use reference for large objects. In order not to accidentally modify the argument, we can add const
to it.
Not that, simply using reference cannot accept constant parameter or temporary value. But const reference can bind everything.
Pass by R-value reference
It is rare, but let’s not omit it. Passing by R-value requires that the argument is a R-value. Duh
1 | void PassMove(Object&& object) |
Notice that, R-value parameter does not accept L-value, so we must explicitly move our L-value to match the type.
1 | ========== Pass Move |
Since std::move
works as a type cast, and not actual assignment happens, our obj
is not really moved.
Parameter consuming
Sometimes, especially in constructor, we need to copy the argument to initialize the certain members. In this case, which type of argument has lower overhead? Instead of calling Print
only, we may have an initialization or assignment.
1 | void Consume(Object object) |
The argument part is clear as we’ve talked about just now, the only thing different is that we now have a new copy or move constructor. Of course passing by value is a terrible choice, so which one should we use? L-value reference or R-value reference?
But we cannot use R-value reference as it cannot bind L-value, so does it mean that L-value reference is our only choice? The answer is no. If you use ReSharper, you might have seen such a suggestion.
This problem can be demonstrated by the following function. Pass by reference, by value then move it.
1 | void ConsumeCopy(const Object& object) |
To tell which one is better, we can have a little test. There are but two types of arguments, L-value and R-value. So we call each once to see the overhead.
1 | BEGIN_SCOPE("Pass by Reference"); |
The result is not that surprising, pass by reference has fewer output, thus seems to be a better choice.
1 | ========== Reference |
It may be a little long, so let’s summarize it, except Object obj;
, we have the following statistics. Since a moved object’s destructor will also do fewer recycling, I count it individually.
Pass by reference | Pass by value then move | Difference | |
---|---|---|---|
(Parameterized) Constructor | 1 | 1 | 0 |
Copy Constructor | 2 | 1 | -1 |
Move Constructor | 0 | 2 | +2 |
Destructor | 3 | 2 | -1 |
Destructor (moved) | 0 | 2 | +2 |
If both types of the arguments are passed with a roughly equal possibility, the overhead is related to the efficiency of copy and move. If the cost of copy and move are the same, then apparently pass by reference is better. You may choose the other only if move is much more efficient than copy.
However, if the argument is always (or most of the time) a temporary value, you may need to reconsider your choice. For example, if you use a string as the name of an object, then this string will most likely come from a temporary value.
Pass by reference | Pass by value then move | Difference | |
---|---|---|---|
(Parameterized) Constructor | 1 | 1 | 0 |
Copy Constructor | 1 | 0 | -1 |
Move Constructor | 0 | 1 | +1 |
Destructor | 2 | 1 | -1 |
Destructor (moved) | 0 | 1 | +1 |
In this case, pass by value then move it can be a better choice.
Objects in Array
There is one thing we missed, that is array. What about the objects in array? For this, we can also write a simple test.
1 | BEGIN_SCOPE("Array"); |
By running it, we’ll have the following output.
1 | ========== Array |
We can see that, array will call the default constructor on each element, so will the new
operator. Correspondingly, their destructor will also be called. So it introduces a problem that you have to provide a default constructor or explicitly initialize every element on creation. Of course there is a work around, that is using malloc
and free
. But in this case you will also lose the destructor.
Epilogue
I have wished to take a deep look at how C++ handle objects for long, and now this is the day, when I send all my doubts and daemons on their way, …
Anyway, it helps me to understand the object lifecycle in C++, and hope this post can also help you. ᓚᘏᗢ