Saying that C++ has simple rules for variables initialization is probably quite risky :) For example, you can read Initialization in C++ is Bonkers : r/cpp to see a vibrant discussion about this topic.
But let’s try with just a small part of variables: static variables.
How are they initialized? What happens before main()
(*) ?
Warning:: implementation dependent, see explanations in the post.
Intro
Have a look at the following code where I use a global variable t
(nice and descriptive name... right? :)) :
classTest
{
public:
Test(){}
public:
int _a;
};
Test t;// <<
int main()
{
return t._a;
}
What is the value of t._a
in main()
?
Is the constructor of Test
even called?
Let’s run the debugger!
Debugging
I’ll be using Visual Studio 2017 to run my apps. Although the initialization phase is implementation depended, runtime systems share a lot of ideas to match with the standard.
I created a breakpoint at the start of Test::Test()
and this is the call stack I got:
test_static.exe!Test::Test() Line 12
test_static.exe!`dynamic initializer for '_t''() Line 20
ucrtbased.dll!_initterm(void(*)() * first, void(*)() * last) Line 22
test_static.exe!__scrt_common_main_seh() Line 251
test_static.exe!__scrt_common_main() Line 326
test_static.exe!mainCRTStartup() Line 17
Wow… the runtime invokes a few functions before the main()
kicks in!
The debugger stopped in a place called dynamic initializer for '_t''()
. What’s more, the member variable _a
was already set to 0
.
Let’s look at the steps:
Our global variable t
is not constant initialized. Because according to the standard constant initialization @cppreference it should have the form:
static T &ref=constexpr;
static T object=constexpr;
So the following things happen:
For all other non-local static and thread-local variables, Zero initialization takes place.
And then:
After all static initialization is completed, dynamic initialization of non-local variables occurs…
In other words: the runtime initializes our variables to zero and then it invokes the dynamic part.
Zero initialization
I’ve found this short and concise summary of Zero Initialization @MSDN:
- Numeric variables are initialized to 0 (or 0.0, or 0.0000000000, etc.).
- Char variables are initialized to ‘\0’.
- Pointers are initialized to nullptr.
- Arrays, POD classes, structs, and unions have their members initialized to a zero value.
Out object t
is a class instance so that the compiler will initialize its members to zero.
What’s more, global variables might be put into BSS segment of the program. Which means that they don’t take any space on disk. The whole BSS segment is represented by only the length (sum of sizes of all global variables). The section is then cleared (something like memset(bssStart, bssLen, 0)
).
For example, looking at the asm output from my code it looks like MSVC put t
variable in _BSS
:
_BSS SEGMENT
?t@@3VTest@@A DD 01H DUP (?) ; t
_BSS ENDS
You can read more @cppreference - zero initialization
Dynamic initialization
From the standard 6.6.2 Static initialization “basic.start.static”, N4659, Draft
Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization.
In MSVC each dynamic initializer is loaded into arrays of functions:
// internal_shared.h
typedefvoid(__cdecl* _PVFV)(void);
// First C++ Initializer
extern _CRTALLOC(".CRT$XCA") _PVFV __xc_a[];
// Last C++ Initializer
extern _CRTALLOC(".CRT$XCZ") _PVFV __xc_z[];
And later, a method called _initterm
invokes those functions:
_initterm(__xc_a, __xc_z);
_initterm
just calls every function, assuming it’s not null:
extern"C"void __cdecl _initterm(_PVFV*const first,
_PVFV*const last)
{
for(_PVFV* it = first; it != last;++it)
{
if(*it ==nullptr)
continue;
(**it)();
}
}
If any of the initializers throws an exception, std::terminate()
is called.
Dynamic initializer for t
will call its constructor. This is exactly what I’ve seen in the debugger.
On Linux
According to Linux x86 Program Start Up and Global Constructors and Destructors in C++:
There’s a function __do_global_ctors_aux
that calls all “constructors” (it’s for C, but should be similar for C++ apps). This function calls constructors that are specified in the .ctors
of ELF image.
As I mentioned, the details are different vs MSVC, but the idea of function pointers to constructors are the same. At some point before main()
the runtime must call those constructors.
Implementation Dependent
Although non-local variables will be usually initialized before main() starts, it's not guaranteed by the standard. So if your code works on one platform, it doesn't mean it will work on some other compiler, or even version of the same compiler...
From: C++ draft: basic.start.dynamic#4:
It is implementation-defined whether the dynamic initialization of a non-local non-inline variable with static storage duration is sequenced before the first statement of main or is deferred. If it is deferred, it strongly happens before any non-initialization odr-use of any non-inline function or non-inline variable defined in the same translation unit as the variable to be initialized.
Storage and Linkage
So far I’ve used one global variable, but it wasn’t even marked as static
. So what is a ‘static’ variable?
Colloquially, a static variable is a variable that its lifetime is the entire run of the program. Such a variable is initialized before main()
and destroyed after.
In the C++ Standard 6.7.1 Static storage duration “basic.stc.static”, N4659, Draft:
All variables which do not have dynamic storage duration, do not have thread storage duration, and are not local have static storage duration. The storage for these entities shall last for the duration of the program
As you see, for non-local variables, you don’t have to apply the static
keyword to end with a static variable.
We have a few options when declaring a static variable. We can distinguish them by using: storage and linkage:
- Storage:
- automatic - Default for variables in a scope.
- static - The lifetime is bound with the program.
- thread - The object is allocated when the thread begins and deallocated when the thread ends.
- dynamic - Per request, using dynamic memory allocation functions.
- Linkage
- no linkage - The name can be referred to only from the scope it is in.
- external - The name can be referred to from the scopes in the other translation units (or even from other languages).
- internal - The name can be referred to from all scopes in the current translation unit
By default, if I write int i;
outside of main()
(or any other function) this will be a variable with a static storage duration and external linkage.
Here’s a short summary:
int i;// static storage, external linkage
staticint t;// static storage, internal linkage
namespace{
int j;// static storage, internal linkage
}
constint ci =100;// static storage, internal linkage
int main()
{
}
Although usually, we think of static variables as globals it’s not always the case. By using namespaces or putting statics in a class, you can effectively hide it and make available according to requirements.
Static variables in a class
You can apply static
to a data member of a class:
classMyClass
{
public:
...
private:
staticint s_Important;
};
// later in cpp file:
int s_Important =0;
s_Important
has a static storage duration and it’s a unique value for all class objects. They have external linkage - assuming class also has external linkage.
Before C++17 each static class data member have to be defined in some cpp
file (apart from static const integers…). Now you can use inline
variables:
classMyClass
{
public:
...
private:
// declare and define in one place!
// since C++17
inlinestaticint s_Important =0;
};
As I mentioned earlier, with classes (or namespaces) you can hide static variables, so they are not “globals”.
Static variables in functions
There’s also another special case that we should cover: statics in a function/scope:
voidFoo()
{
staticbool bEnable =true;
if(bEnable)
{
// ...
}
}
From cppreference: storage duration
Static variables declared at block scope are initialized the first time control passes through their declaration (unless their initialization is zero- or constant-initialization, which can be performed before the block is first entered). On all further calls, the declaration is skipped.
For example, sometimes I like to use static bEnable
variables in my debugging sessions (not in production!). Since the variable is unique across all function invocations, I can switch it back and forth from true
to false
. The variable can that way enable or disable some block of code: let’s say new implementation vs old one. That way I can easily observe the effects - without recompiling the code.
Wrap up
Although globals/statics sounds easy, I found it very hard to prepare this post. Storage, linkage, various conditions and rules.
I was happy to see the code behind the initialization, so it’s more clear how it’s all done.
Few points to remember:
- static variable’s lifetime is bound with the program lifetime. It’s usually created before
main()
and destroyed after it. - static variable might be visible internally (internal linkage) or externally (external linkage)
- at the start static variables are zero-initialized, and then dynamic initialization happens
- Still... be careful, as Static initializers will murder your family :)
Ah… wait… but what about initialization and destruction order of such variables?
Let’s leave this topic for another time :)
For now, you can read about static in static libraries: Static Variables Initialization in a Static Library, Example.