Quantcast
Channel: Bartek's coding blog
Viewing all 325 articles
Browse latest View live

C++ 17 Features

$
0
0

C++17 features

In my last C++ summary (for 2016) I wrote that the draft for C++17 is the most important thing that happened. I’ve put a list of features of the new standard, and today I’d like to expand the topic so we can learn some more details.

Intro

Work in Progress! I am happy to see your help with the list! :)

9 descriptions missing out of 39! Plus all from library changes

Updated: This post was updated at 8am, 13th January 2017.

If you have code examples, better explanations or any ideas, let me know! I am happy to update the current post so that it has some real value for others.

The plan is to have a list of features with some basic explanation, little example (if possible) and some additional resources, plus a note about availability in compilers. Probably, most of the features might require separate articles or even whole chapters in books, so the list here will be only a jump start.

See this github repo: github/fenbf/cpp17features. Add a pull request to update the content.

The list comes from the following resources:

And onf of the most important resource: Working Draft, Standard for Programming Language C++

Language Features

New auto rules for direct-list-initialization

N3922

GCC: 5.0Clang: 3.8MSVC: 14.0

Fixes some cases with auto type deduction. The full background can be found in Auto and braced-init-lists, by Ville Voutilainen.

It fixes the problem of deducing std::initializer_list like:

auto x = foo();// copy-initialization
auto x{foo};// direct-initialization, initializes an initializer_list
int x = foo();// copy-initialization
int x{foo};// direct-initialization

And for the direct initialization, new rules are:

  • For a braced-init-list with only a single element, auto deduction will deduce from that entry;
  • For a braced-init-list with more than one element, auto deduction will be ill-formed.

Basically, auto x { 1 }; will be now deduced as int, but before it was an initializer list.

static_assert with no message

N3928

GCC: 6.0Clang: 2.5MSVC: 15.0 preview 5

Self-explanatory. It allows just to have the condition without passing the message, version with the message will also be available. It will be compatible with other asserts like BOOST_STATIC_ASSERT (that didn’t take any message from the start).

typename in a template template parameter

N4051

GCC: 5.0Clang: 3.5MSVC: 14.0

Allows you to use typename instead of class when declaring a template template parameter. Normal type parameters can use them interchangeably, but template template parameters were restricted to class, so this change unifies these forms somewhat.

template<template<typename...>typenameContainer>
// used to be invalid ^^^^^^^^
struct foo;

foo
<std::vector> my_foo;

Removing trigraphs

N4086

GCC: 5.1Clang: 3.5MSVC: Yes

Removes ??=, ??(, ??>, …

Makes the implementation a bit simpler, see MSDN Trigraphs

Nested namespace definition

N4230

GCC: 6.0Clang: 3.6MSVC: 14.3

Allows to write:

namespace A::B::C {
//…
}

Rather than:

namespace A {
namespace B {
namespace C {
//…
}
}
}

Attributes for namespaces and enumerators

N4266

GCC: 4.9 (namespaces)/ 6 (enums)Clang: 3.4MSVC: 14.0

Permits attributes on enumerators and namespaces. More details in N4196.

enum E {
foobar
=0,
foobat
[[deprecated]]= foobar
};

E e
= foobat;// Emits warning

namespace[[deprecated]] old_stuff{
void legacy();
}

old_stuff
::legacy();// Emits warning

u8 character literals

N4267

GCC: 6.0Clang: 3.6MSVC: 14.0

UTF-8 character literal, e.g. u8'a'. Such literal has type char and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-8 code unit. If c-char is not in Basic Latin or C0 Controls Unicode block, the program is ill-formed.

The compiler will report errors if character cannot fit inside u8 ASCII range.

Reference:

Allow constant evaluation for all non-type template arguments

N4268

GCC: 6.0Clang: 3.6MSVC: not yet

todo…

Fold Expressions

N4295

GCC: 6.0Clang: 3.6MSVC: not yet

More background here in P0036

Articles:

Remove Deprecated Use of the register Keyword

P0001R1

GCC: 7.0Clang: 3.8MSVC: not yet

The register keyword was deprecated in the 2011 C++ standard. C++17 tries to clear the standard, so the keyword is now removed.

Remove Deprecated operator++(bool)

P0002R1

GCC: 7.0Clang: 3.8MSVC: not yet

The ++ operator for bool was deprecated in the original 1998 C++ standard, and it is past time to remove it formally.

Removing Deprecated Exception Specifications from C++17

P0003R5

GCC: 7.0Clang: 4.0MSVC: not yet

Dynamic exception specifications were deprecated in C++11. This paper formally proposes removing the feature from C++17, while retaining the (still) deprecated throw() specification strictly as an alias for noexcept(true).

Make exception specifications part of the type system

P0012R1

GCC: 7.0Clang: 4.0MSVC: not yet

Previously exception specifications for a function didn’t belong to the type of the function, but it will be part of it.

We’ll get an error in the case:

void(*p)()throw(int);
void(**pp)() noexcept =&p;// error: cannot convert to pointer to noexcept function

Aggregate initialization of classes with base classes

P0017R1

GCC: 7.0Clang: 3.9MSVC: not yet

If a class was derived from some other type you couldn’t use aggregate initialization. But now the restriction is removed.

struct base {int a1, a2;};
struct derived : base {int b1;};

derived d1
{{1,2},3};// full explicit initialization
derived d1
{{},1};// the base is value initialized

To sum up: from the standard:

An aggregate is an array or a class with:
* no user-provided constructors (including those inherited from a base class),
* no private or protected non-static data members (Clause 11),
* no base classes (Clause 10) and // removed now!
* no virtual functions (10.3), and
* no virtual, private or protected base classes (10.1).

Lambda capture of *this

P0018R3

GCC: 7.0Clang: 3.9MSVC: not yet

this pointer is implicitly captured by lambdas inside member functions. Member variables are always accessed by this pointer.

Example:

struct S {
int x ;
void f(){
// The following lambda captures are currently identical
auto a =[&](){ x =42;}// OK: transformed to (*this).x
auto b =[=](){ x =43;}// OK: transformed to (*this).x
a
();
assert
( x ==42);
b
();
assert
( x ==43);
}
};

Now you can use *this when declaring a lambda, for example auto b = [=, *this]() { x = 43 ; }. That way this is captured by value. Note that the form [&,this] is redundant but accepted for compatibility with ISO C++14.

Capturing by value might be especially important for async invocation, paraller processing.

Using attribute namespaces without repetition

P0028R4

GCC: 7.0Clang: 3.9MSVC: not yet

Other name for this feature was “Using non-standard attributes” in P0028R3 and PDF: P0028R2 (rationale, examples).

Simplifies the case where you want to use multiple attributes, like:

void f(){
[[rpr::kernel, rpr::target(cpu,gpu)]]// repetition
do-task();
}

Proposed change:

void f(){
[[using rpr: kernel, target(cpu,gpu)]]
do-task();
}

That simplification might help when building tools that automatically translate annotated such code into a different programming models.

Dynamic memory allocation for over-aligned data

P0035R4

GCC: 7.0Clang: 4.0MSVC: not yet

In the following example:

class alignas(16)float4{
float f[4];
};
float4*p =newfloat4[1000];

C++11/14 did not specify any mechanism by which over-aligned data can be dynamically allocated correctly (i.e. respecting the alignment of the data). In the example above, not only is an implementation of C++ not required to allocate properly-aligned memory for the array, for practical purposes it is very nearly required to do the allocation incorrectly.

C++17 fixes that hole by introducing additional memory allocation functions that use align parameter:

void*operatornew(std::size_t, std::align_val_t);
void*operatornew[](std::size_t, std::align_val_t);
voidoperatordelete(void*, std::align_val_t);
voidoperatordelete[](void*, std::align_val_t);
voidoperatordelete(void*, std::size_t, std::align_val_t);
voidoperatordelete[](void*, std::size_t, std::align_val_t);

Unary fold expressions and empty parameter packs

P0036R0

GCC: 6.0Clang: 3.9MSVC: not yet

todo…

__has_include in preprocessor conditionals

P0061R1

GCC: 5.0Clang: yesMSVC: not yet

This feature allows a C++ program to directly, reliably and portably determine whether or not a library header is available for inclusion.

Example: This demonstrates a way to use a library optional facility only if it is available.

#if __has_include(<optional>)
# include <optional>
# define have_optional 1
#elif __has_include(<experimental/optional>)
# include <experimental/optional>
# define have_optional 1
# define experimental_optional 1
#else
# define have_optional 0
#endif

Template argument deduction for class templates

P0091R3

GCC: 7.0Clang: not yetMSVC: not yet

Before C++17, template deduction worked for functions but not for classes.
For instance, the following code was legal:

void f(std::pair<int,char>);

f
(std::make_pair(42,'z'));

because std::make_pair is a template function.
But the following wasn’t:

void f(std::pair<int,char>);

f
(std::pair(42,'z'));

although it is semantically equivalent. This was not legal because std::pair is a template class, and template classes could not apply type deduction in their initialization.

So before C++17 one has to write out the types explicitly, even though this does not add any new information:

void f(std::pair<int,char>);

f
(std::pair<int,char>(42,'z'));

This is fixed in C++17 where template class constructors can deduce type parameters. The syntax for constructing such template classes is therefore consistent with the syntax for constructing non-template classes.

todo: deduction guides.

Non-type template parameters with auto type

P0127R2

GCC: 7.0Clang: 4.0MSVC: not yet

todo…

Guaranteed copy elision

P0135R1

GCC: 7.0Clang: 4.0MSVC: not yet

Articles:

New specification for inheriting constructors (DR1941 et al)

P0136R1

GCC: 7.0Clang: 3.9MSVC: not yet

todo…

Direct-list-initialization of enumerations

P0138R2

GCC: 7.0Clang: 3.9MSVC: not yet

Allows to initialize enum class with a fixed underlying type:

enumclassHandle:uint32_t{Invalid=0};
Handle h {42};// OK

Allows to create ‘strong types’ that are easy to use…

Stricter expression evaluation order

P0145R3

GCC: 7.0Clang: 4.0MSVC: not yet

todo…

constexpr lambda expressions

P0170R1

GCC: 7.0Clang: not yetMSVC: not yet

todo…

A 5 min episode of Jason Turner’s C++ Weekly about constexpr lambdas

Differing begin and end types in range-based for

P0184R0

GCC: 6.0Clang: 3.6MSVC: 15.0 Preview 5

Changing the definition of range based for from:

{
auto&& __range =for-range-initializer;
for(auto __begin = begin-expr,
__end
= end-expr;
__begin
!= __end;
++__begin ){
for-range-declaration =*__begin;
statement
}
}

Into:

{
auto&& __range =for-range-initializer;
auto __begin = begin-expr;
auto __end = end-expr;
for(; __begin != __end;++__begin ){
for-range-declaration =*__begin;
statement
}
}

Types of __begin and __end might be different; only the comparison operator is required. This little change allows Range TS users a better experience.

[[fallthrough]] attribute

P0188R1

GCC: 7.0Clang: 3.9MSVC: 15.0 Preview 4

Indicates that a fallthrough in a switch statement is intentional and a warning should not be issued for it. More details in P0068R0.

switch(c){
case'a':
f
();// Warning emitted, fallthrough is perhaps a programmer error
case'b':
g
();
[[fallthrough]];// Warning suppressed, fallthrough is intentional
case'c':
h
();
}

[[nodiscard]] attribute

P0189R1

GCC: 7.0Clang: 3.9MSVC: not yet

[[nodiscard]] is used to stress that the return value of a function is not to be discarded, on pain of a compiler warning. More details in P0068R0.

[[nodiscard]]int foo();
void bar(){
foo
();// Warning emitted, return value of a nodiscard function is discarded
}

This attribute can also be applied to types in order to mark all functions which return that type as [[nodiscard]]:

[[nodiscard]]structDoNotThrowMeAway{};
DoNotThrowMeAway i_promise();
void oops(){
i_promise
();// Warning emitted, return value of a nodiscard function is discarded
}

[A 4 min video about [[nodiscard]] in Jason Turner’s C++ Weekly](https://www.youtube.com/watch?v=l_5PF3GQLKc)

[[maybe_unused]] attribute

P0212R1

GCC: 7.0Clang: 3.9MSVC: not yet

Suppresses compiler warnings about unused entities when they are declared with [[maybe_unused]]. More details in P0068R0.

staticvoid impl1(){...}// Compilers may warn about this
[[maybe_unused]]staticvoid impl2(){...}// Warning suppressed


void foo(){
int x =42;// Compilers may warn about this
[[maybe_unused]]int y =42;// Warning suppressed
}

[A 3 min video about [[maybe_unused]] in Jason Turner’s C++ Weekly](https://www.youtube.com/watch?v=WSPmNL9834U)

Ignore unknown attributes

P0283R2

GCC: YesClang: 3.9MSVC: not yet

Clarifies that implementations should ignore any attribute namespaces which they do not support, as this used to be unspecified. More details in P0283R1.

//compilers which don't support MyCompilerSpecificNamespace will ignore this attribute
[[MyCompilerSpecificNamespace::do_special_thing]]
void foo();

Pack expansions in using-declarations

P0195R2

GCC: 7.0Clang: 4.0MSVC: not yet

Allows you to inject names with using-declarations from all types in a parameter pack.

In order to expose operator() from all base classes in a variadic template, we used to have to resort to recursion:

template<typename T,typename...Ts>
structOverloader: T,Overloader<Ts...>{
using T::operator();
usingOverloader<Ts...>::operator();
// […]
};

template<typename T>structOverloader<T>: T {
using T::operator();
};

Now we can simply expand the parameter pack in the using-declaration:

template<typename...Ts>
structOverloader:Ts...{
usingTs::operator()...;
// […]
};

Remarks

Decomposition declarations

P0217R3

GCC: 7.0Clang: 4.0MSVC: not yet

Helps when using tuples as a return type. It will automatically create variables and tie them. More details in P0144R0. Was originally called “structured bindings”.

For example:

std::tie(a, b, c)= tuple;// a, b, c need to be declared first

Now we can write:

auto[ a, b, c ]= tuple;

Such expressions also work on structs, pairs, and arrays.

Articles:

Hexadecimal floating-point literals

P0245R1

GCC: 3.0Clang: YesMSVC: not yet

Allows to express some special floating point values, for example, the smallest normal IEEE-754 single precision value is readily written as 0x1.0p-126.

init-statements for if and switch

P0305R1

GCC: 7.0Clang: 3.9MSVC: not yet

New versions of the if and switch statements for C++: if (init; condition) and switch (init; condition).

This should simplify the code. For example, previously you had to write:

{
auto val =GetValue();
if(val)
// on success
else
// on false...
}

Look, that val has a separate scope, without it it will ‘leak’.

Now you can write:

if(auto val =GetValue(); val)
// on success
else
// on false...

val is visible only inside the if and else statements, so it doesn’t ‘leak’.

Inline variables

P0386R2

GCC: 7.0Clang: 3.9MSVC: not yet

Previously only methods/functions could be specified as inline, now you can do the same with variables, inside a header file.

A variable declared inline has the same semantics as a function declared inline: it can be defined, identically, in multiple translation units, must be defined in every translation unit in which it is used, and the behavior of the program is as if there is exactly one variable.

structMyClass
{
staticconstint sValue;
};

inlineintconstMyClass::sValue =777;

Or even:

structMyClass
{
inlinestaticconstint sValue =777;
};

Articles

DR: Matching of template template-arguments excludes compatible templates

P0522R0

GCC: 7.0Clang: 4.0MSVC: not yet

This feature resolves Core issue CWG 150.

From the paper:

This paper allows a template template-parameter to bind to a template argument whenever the template parameter is at least as specialized as the template argument. This implies that any template argument list that can legitimately be applied to the template template-parameter is also applicable to the argument template.

Example:

template<template<int>class>void FI();
template<template<auto>class>void FA();
template<auto>struct SA {/* ... */};
template<int>struct SI {/* ... */};
FI
<SA>();// OK; error before this paper
FA
<SI>();// error

template<template<typename>class>void FD();
template<typename,typename=int>struct SD {/* ... */};
FD
<SD>();// OK; error before this paper (CWG 150)

(Some useful example needed, since it’s a bit vague to me).

std::uncaught_exceptions()

N4259

GCC: 6.0Clang: 3.7MSVC: 14.0

More background in the original paper: PDF: N4152 and GOTW issue 47: Uncaught Exceptions.

The function returns the number of uncaught exception objects in the current thread.

This might be useful when implementing proper Scope Guards that works also during stack unwinding.

A type that wants to know whether its destructor is being run to unwind this object can query uncaught_exceptions
in its constructor and store the result, then query uncaught_exceptions again in its destructor; if the result is different,
then this destructor is being invoked as part of stack unwinding due to a new exception that was thrown later than the object’s construction

The above quote comes from PDF: N4152.

constexpr if-statements

P0292R2

GCC: 7.0Clang: 3.9MSVC: not yet

The static-if for C++! This allows you to discard branches of an if statement at compile-time based on a constant expression condition.

ifconstexpr(cond)
statement1
;// Discarded if cond is false
else
statement2
;// Discarded if cond is true

This removes a lot of the necessity for tag dispatching and SFINAE:

SFINAE

template<typename T, std::enable_if_t<std::is_arithmetic<T>{}>*=nullptr>
auto get_value(T t){/*...*/}

template<typename T, std::enable_if_t<!std::is_arithmetic<T>{}>*=nullptr>
auto get_value(T t){/*...*/}

Tag dispatching

template<typename T>
auto get_value(T t, std::true_type){/*...*/}

template<typename T>
auto get_value(T t, std::false_type){/*...*/}

template<typename T>
auto get_value(T t){
return get_value(t, std::is_arithmetic<T>{});
}

if constexpr

template<typename T>
auto get_value(T t){
ifconstexpr(std::is_arithmetic_v<T>){
//...
}
else{
//...
}
}

Articles:

Library Features

To get more details about library implementation I suggest those links:

This section only mentions some of the most important parts of library changes, it would be too impractical to go into details of every little change.

Merged: The Library Fundamentals 1 TS (most parts)

P0220R1

We get the following items:

The wording from those components comes from Library Fundamentals V2 to ensure the wording includes the latest corrections.

Resources:

Merged: The Parallelism TS, a.k.a. “Parallel STL.”,

P0024R2

Merged: File System TS,

P0218R1

Merged: The Mathematical Special Functions IS,

PDF - WG21 P0226R1

Improving std::pair and std::tuple

N4387

std::shared_mutex (untimed)

N4508

Variant

P0088R2

Splicing Maps and Sets

P0083R2

From Herb Sutter, Oulu trip report:

You will now be able to directly move internal nodes from one node-based container directly into another container of the same type. Why is that important? Because it guarantees no memory allocation overhead, no copying of keys or values, and even no exceptions if the container’s comparison function doesn’t throw.

Contributors

This is a place for you to be mentioned!

Contributors:

Summary


Const, Move and RVO

$
0
0

Const Move and RVO

C++ is a surprising language. Sometimes simple things are not that simple in practice. Last time I argued that in function bodies const should be used most of the time. But two cases were missed: when moving and when returning a value.

Does const influence move and RVO?

Intro

Just to recall, we’re talking here about using const for variables inside function bodies. Not about const for a return type, const input parameters, or const methods. In example:

Z foo(T t, X x)
{
const Y y = superFunc(t, x);
const Z z = compute(y);
return z;
}

In the code above it’s best when y and z are declared as constant.

So what’s the problem then?

First of all, you cannot move from an object that is marked as const.

Another potential problem is when a compiler is trying to use (Named) Return Value Optimization (NRVO or RVO). Can it work when the variable to be elided is constant?

I got the following comment from u/sumo952:

Expert #1: “Put const on every variable that does not change. It’s good practice, prevents you from mistakes (changing a variable you intended to be const), and if you’re lucky, the compiler might be able to optimize better.”
Expert #2: “You cannot move from a variable marked as const, and instead the copy-constructor/assignment will be invoked more often. So spraying const-glitter all over your variables may do you more harm than good.”
Great! Now I got two contradictory expert opinions. And sorry, “Know what you’re doing” doesn’t help.

Let’s try to think about better advice. But first, we need to understand what’s the problem with move and RVO.

Move semantics

Move semantics (see this great post for more: C++ Rvalue References Explained
By Thomas Becker
) enables us to implement a more efficient way of copying large objects. While value types need to be copied byte by byte anyway, types like containers, resource handles might sometimes be copied by stealing.

For instance, when you want to ‘move’ from one vector to another instead of copying all the data, you can just exchange pointers to the memory allocated on the heap.

Move operation cannot always be invoked, it’s done on r-value references - objects that are usually temporal, and it’s safe to steal from them.

Here’s some explicit code for move:

a = std::move(b);
// b is now in a valid, but 'empty' state!

In the simple code snippet above if the object a has a move assignment operator (or a move constructor depending on the situation), we can steal resources from b.

When b is marked as const instead of an r-value reference, we’ll get a const r-value’ reference. This type cannot be passed to move operators, so a standard copy constructor or assignment operator will be invoked. No performance gain!

Note, that there are const r-values in the language, but their use is rather exotic, see this post for more info if needed: What are const rvalue references good for? and also in CppCon 2014: Stephan Lavavej talk.

OK… but is this really a huge problem for us?

Temporary objects

First of all, most of the time move semantics works on temporary objects, so you won’t even see them. Even if you have some constant objects, the result of some function invocation (like a binary operator) might be something else, and usually not const.

const T a = foo();
const T b = bar();
const T c = a + b;// result is a temp object
// return type for the + operator is usually not marked as const
// BTW: such code is also a subject of RVO... read later...

So, in a typical situation, constness of the objects won’t affect move semantics.

Explicit moves

Another case is when you want to move something explicitly. In other words, you take your variable which is an l-value, and you want to make it as it was an r-value.

The core guideline mentions that we usually shouldn’t often call std::move explicitly:

ES.56:

ES.56: Write std::move() only when you need to explicitly move an object to another scope

And in the case when you really need such operation I assume you know what you’re doing! Using const here is not a good idea. So I agree that my advice can be altered a bit in that context.

Returning a value

In the case when copy elision cannot be applied the compiler will try to use a move assignment operator or a move constructor if possible. If those aren’t available, then we have to perform a standard copy.

For example:

MyTypeProduceType(int a)
{
MyType t;
t
.mVal = a;
return t;
}

MyTypeProduceTypeWithConst(int a)
{
constMyType t =ProduceType(a);
return t;
}

MyType t;
t
=ProduceTypeWithConst(1);

What’s the expected output here? For sure two objects needs to be created t and one object inside the functions. But when returning from ProduceTypeWithConst the compiler will try to invoke move if possible.

MyType()
MyType()
operator=(MyType&& v)
~MyType()
~MyType()

As you can see marking the return object as const didn’t cause any problems to perform a move. It would be a problem only when the function returned a const MyType, but it returns MyType so we’re safe here.

So all in all, I don’t see a huge problem with move semantics.

Let’s now move to another topic RVO…

Return Value Optimization

RVO is an optimization performed by most compilers (and mandatory in C++17!). When possible, the compiler won’t create an additional copy for the temporal returned object.

MyTypeProduceType()
{
MyType rt;
// ...
return rt;
}

MyType t =ProduceType();// (N)RVO

The canonical C++ would do something like this in the code above:

  • construct rt
  • copy rt to a temporary object that will be returned
  • copy that temporary object into t

But the compiler can elide those copies and just initialize t once.

You can read more about (N)RVO in the articles from FluentCpp and Undefined Behaviour.

Returning const

What happens if your object is const? Like:

MyTypeProduceTypeWithConst(int a)
{
constMyType t =ProduceType(a);
return t;
}

MyType t =ProduceTypeWithConst(1);

Can RVO be applied here? The answer is Yes.

It appears that const doesn’t do any harm here. What might be the problem is when RVO cannot be invoked, then the next choice is to use move semantics. But we already covered that in the section above.

The slightly altered advice

In function bodies:
Use const whenever possible. Exceptions:
* Assuming the type is movable, when you want to move explicitly such variable, then adding const might block move semantics.

Still, if you’re unsure and you’re working with some larger objects (that have move enabled), it’s best to measure measure measure.

Some more guidelines:

Core Guidelines, F.20:

The argument for adding const to a return value is that it prevents (very rare) accidental access to a temporary. The argument against is prevents (very frequent) use of move semantics.

Summary

While initially, I was concerned about some negative effects of using const in the case of move and RVO, I think it’s not that serious. Most of the time the compiler can elide copies and properly manage temporary objects.

You can play with the code here: @coliru.

  • Did I miss something?
  • In what situations you’re afraid to put const?

How To Stay Sane with Modern C++

$
0
0

Complex C++

Have you seen my recent blog post with the list of C++17 features? Probably this is not the best measurement, but it got only ~30% of the average read (while other articles might get like 70, or even 90%). I am not complaining, the list is crazy and contains probably too many details of the new standard.

Another stat: C++ standard page count: from 879 pages for C++98/03 to 1586 for C++17 (draft)!

Do you need to learn all of that stuff to write good code?
How to stay sane in the C++ world today?

Intro

You probably know that C++ is a complex language. As I’ve found, there’s even a whole Wiki page about the criticism of Cpp. Modern C++ adds even more stuff to the package!

Here’s the full data about the page count in the specs that I’ve mentioned before:

Page count of C++ specs

It looks like C++17 is almost ~80% ‘larger’ than C++98/03. You can complain about added complexity and that it’s hard to learn all of those things. But is this so terrible? What can you do about the whole situation?

This post was motivated by some stories recently found::

First, let’s see some problems that you might bump into in C++.

Some Problems

Just to name a few:

Too slow pace

In 2016, as I wrote in my summary post, we got the draft for C++17. While it’s great, that we get a new standard every three years, a lot of developers complained that the new standard is not what everyone waited for.

A lot of features: like concepts, modules, ranges, co-routines, … were not accepted and we need to wait at least three more years to get them in the spec.

So, for some features, the pace of standardization is very slow.

As a positive aspect, I can only say that most of the mentioned features are already implemented, at least in the experimental form. So you can play with them and be prepared for the final version. Take a look at GCC concepts, modules in VS or Clang, Ranges, etc.

Too fast pace

As usually, we might have two contradicting opinions here. Although for some people the pace is slow, for others it’s hard to keep up with the changes.

You’ve just learned C++11… and now you need to update the knowledge with C++14, and then C++17 is along the way. Three years is not that short time, but bear in mind the compiler conformance, company policies, team guidelines might walk at a different pace.

Do your companies update to the most modern C++ version immediately?

Confusion / Complexity

Just read that comment:

CallMeDonk

I love c++. It’s my go to language, but you have to admit its ‘hodge podge’ implementation of value types is bizarre at best. Most programmers including me prefer simple well-defined language constructs over bizarre and over complicated grammar. I’m not a computer language lawyer. I’m a programmer.

Is C++ clear in every part? Probably not…

Move semantics

The principle of move semantics are quite clear: instead of copying just try to exchange the pointers to allocated memory, and you should get a nice performance boost. But the devil is in the detail.

I don’t write a lot of generic code, so fortunately I don’t have to think about move semantics all the time. But I was quite confused when I bumped into move and const - see my last article on that. I don’t believe every C++ will understand the rules here. Especially that you now need to remember about 6 default operations generated by the compiler: default constructor, destructor, copy constructor, move constructor, assign operator and move assignment operator.

Rvalues/xvalues/prvalues… myValues, fooValues

The last ones are made up… but still having all of the value categories is overwhelming!

Previously you just had to know lvalue vs rvalue, now it’s a bit more subtle.

Still, the question is if you need to know it by hard on a daily basis?

Some good comments:

c0r3ntin

It is complicated, but not on a daily basis.
Can this value be addressed ? can it be copied ? can it be moved ? Should it be moved ?
There is very few situation where you want to actively to be very specific and need a full understanding. ( templated library writing, hot paths, etc).
Most of the time C++ is not more complicated than java or something. Sadly this is lost on most people. C++ may be the most complex language out there but you can write very good code without caring about the specific.
BigObject o = getBigObject();

Initialization

18 ways now! - Initialization in C++ is bonkers and the r/cpp thread

Template deduction

I was quite lost when I saw all the changes for C++17; there are so many details about templates deduction!

Although you might not write template code all the time (unless you’re a library developer), it’s good to remember that deduction rules now also apply on ‘auto’ type deduction since we have AAA (Almost Always Auto) rule.

Fortunately, the rules are getting a bit easier with C++17 where there are things like template <auto>, fixing auto i { 0 }, template template parameters, etc…

Others areas?

What are your main problems with the language?

So far, we’ve discussed some problems, so how to live with them… and possibly how to solve them?

How to stay sane

There’s no the perfect programming language, every one of them has some issues. Here are my suggestions how to cope with the problems of Modern C++.

Stay positive, the language is evolving

No one wants to write code using old tools. We’ve already seen a lot of complaints about old C++ before C++11. It took almost 13 years (counting from major C++98, not including minor C++03) to came up with the major version: C++11. Now we can be happy that we get back on track, and every three years there will be some changes. At the end of the day, you cannot say that your language is dead and old.

And the tools as well!

Thanks to Clang and also improved development speed in other platforms, we get tools like:

While it’s not super great as for other languages (especially Java based or .NET based), it’s getting better and better. Bear in mind that because of the complex C++ grammar it’s very hard to implement tools that analyse the code on the fly.

Try to stay up to date

C++ community is very much alive. There are many blogs, books, conferences… and there’s even a chance a local community is in your city!

For a start, I just suggest going to isocpp.org the central place for all of the events/news/articles. Then you might check Meeting C++ and info about local C++ groups. There’s also reddit/cpp where constantly more and more good stuff is posted.

And remember about books like:

You can also take a look at my recent C++17 Lang Ref Card!

Too much details? Just don’t open the hood

One of the reasons C++ has so much power is that it allows you to implement code very close to the metal. You have control over all of the details, memory layout, performance optimizations, etc, etc… At the same time, such abilities increase the complexity of the language.

Still, if you don’t need to go that far, you can stay at a relatively higher level of abstraction.

Use what you need

C++ is a multi-paradigm language; you can use it many different ways. Recently, I’ve read an interesting comment that said that a Cpp programmer might for years do very well without touching advanced stuff like template metaprogramming or even exceptions. This heavily depends on the code style of the project.

Even such companies like Google limit features of C++, for example, they don’t use exceptions.

This is a bit of repetition, but if you’re not a library developer you might not get into troubles with custom move operators or move constructors. Similarly, advances metaprogramming stuff might also not be a crucial part of your code.

Incremental change

If you start from scratch or have a small code base then going to C++11/14 should be relatively easy. What about million-line of code, code that was started 20 years (or more!) ago?

Just do it step by step.

At least for the new code, you should start using Modern C++. Moreover, by applying “The Boy Scout Rule” you can improve surrounding code that you touch.

This will probably result in some mixed code, but still, it’s better than staying with the legacy style only.

Last resort: your old code will still compile!

One of the reason why C++ specs are getting larger and larger is that the language is backward compatible. So the committee usually introduces new features, but rarely remove the old stuff. So… your code can still compile. If you don’t want to move and use newer things then you can stay with your current style.

From time to time you’ll get some warnings about deprecated stuff or removed features (like auto_ptr in C++17), but even in that case, you can switch the compiler to target some older C++ standard.

Summary

This article is partially a rant, partially a ‘glorification.’ I try to see the bad sides of the language and its evolution process and some positive signs as well.

While we can complain about the complexity, pace of changes, etc… I think we cannot say that the language is dead. That’s a good thing.

I don’t think you have to rapidly chase the new features and immediately rewrite your existing code. Just try to stay up to date with news, use the features that really improves your work and gradually your code should improve and be more ‘modern’ (however, can that be defined - see meetingcpp article on that).

  • What’s your approach when adopting new features from C++11/14/17/1z ?
  • What’s your main problem with C++?
  • Do you use modern C++ in your work?

Modernize: Sink Functions

$
0
0

Passing Ownership of a resource to functions

One of the guidelines from Modern C++ is to avoid using raw new and delete. Instead, you should use a smart pointer, a container or other RAII object. Today I’d like to focus on so-called ‘sink functions’ that takes ownership of input parameters. How can we modernize code around such calls?

Intro

Briefly: a sink function is a function that takes ownership of an input pointer. Such function is now responsible for the allocated memory/resource. It can pass the ownership further or manage it on its own.

Here are links for more definitions: Cpp Wiki: Move Constructor,
Herb Sutter: spart pointer parameters, Simplify C++: Move Semantics

Here’s a little diagram:

sink function

As it’s shown on the picture: Foo() creates a resource - ptr - and passes it to Bar() then it’s transferred to DoStuff() that (hopefully) finalizes it and destroys.

In legacy code, you could probably see code similar to:

voidFoo()
{
MyType* pMyObject =newMyType();// << !

// change pMyObject somehow...

HandleMyType(pMyObject);// transfer ownership
}

// transfer ownership of pMyObject
voidHandleMyType(MyType* pMyObject)
{
// handle the pointer...

delete pMyObject;// finalize...
}

Foo() - is a source type function.
HandleMyType() - is a sink type function.

To be more specific we’re talking here about parameters that are usually movable only and not copyable. Like pointers. If you pass a value type like int param then there’s no need to pass ownership.

Also, usually we’re talking about allocated memory, but it can be any kind of resource: like a file handle, network connection, DB connection, some unique state, etc.

The above code is as usually very simple. In a real example, the ownership can be passed multiple levels down in the call stack hierarchy. The pointer might be even stored in some object that lives much longer than the ‘creation’ point. We could invent lots of examples here. See the picture below that shows many layers where the resource might be processed and eventually released.

Sink functions

Ok… you’ve seen code like that, but is it safe and modern? Definitely not!

The problem

In the above code, you can clearly see a ‘contract’ between one function and another: who is the owner of the resource/pointer.

Can you violate this contract?

Yes… and it’s very simple!

You can easily forget to delete allocated memory, forget about the ownership.

While it’s relatively easy to spot such problem in a short example like the above, it might be a pain in real code! Probably you know what I mean here. There’s a high chance you’ve already tracked bugs/leaks like that :)

The problem is that the contract is almost “verbal” only, or “comment” only. The compiler cannot help you here.

So what if someone doesn’t free the memory?

What if the foo() method needs to exit early (because of some error)?

Modernize

To fix all of the above problems we need to use one powerful mechanism: RAII. Particularly in the form of unique_pointer.

Use unique_ptr

So how about the improved version:

voidFooU()
{
auto pMyObject = make_unique<MyType>();// <<

// change pMyObject somehow...

HandleMyTypeU(move(pMyObject));
}

voidHandleMyTypeU(unique_ptr<MyType> pMyObject)
{
// handle the pointer...
}

A bit better?

Can you violate the contract now?

It’s quite hard!

Do you get help from the compiler?

Yes! It will report compile time errors!

Can you leak the memory?

Not easily!

Is the code more expressive and clean?

Yes!

So many benefits by just using a simple pointer wrapper! Also from the performance point of view, you don’t lose anything, because unique_ptr is just a tiny wrapper around a raw pointer and has the same size as the raw pointer.

Since unique_ptr is a movable-only type (not copyable), you’ll get a compile-timer error if you just want to copy the pointer. That way you need to move the pointer explicitly and that way pass the ownership.

BTW: small improvement: there’s auto used in the example. That way we don’t need to write:

unique_ptr<MyType> pPtr = make_unique<MyType>();

And it follows “AAA” rule.

Partial solution

There’s also a partial solution to our original problem. Sometimes you cannot change the sink function - this might happen when you’re using some third party library. What you can do is at least to be safer at ‘your’ side of the code.

Basically, I would still create an unique_ptr, but since you cannot pass it to the old sink function, you need to release it and pass as a raw pointer.

voidFooUP()
{
auto pMyObject = std::make_unique<MyType>();// <<

// change pMyObject somehow...

if(condition)// the pointer will be deleted automatically here!
return;

HandleMyType(pMyObject.release());
}

As you see the code uses release() method to remove ownership from the pointer. It also returns the raw pointer so we can use it.

What you get here?

Possibly not that much, but at least when your function needs to return early (before passing the pointer), you can be sure the memory won’t leak.

Also, please bear in mind that in real code that ownership can be passed in multiple levels of call stack:

voidFoo()
{
// ...
FooInner(ptr);
}

voidFooInner()
{
// ...
FooX_Inner(ptr);
}

voidFooX_Inner()
{
// ...
FooLib_Inner(ptr);// cannot use unique_ptr here!
}

// ...

Let’s assume that FooLib_Inner cannot use unique_ptr. Still, I believe, there’s a sense in modernizing Foo and FooX_Inner. The code will be safer, and maybe at some point, you’ll be able to improve library code as well. Maybe the library will update, and it will support unique_ptr at some point.

Summary

Play with the code here: @coliru

This was a quick and straightforward post. I believe that by reducing a number of raw new and delete you can end up with much safer code. One way of doing this is to use unique_ptr when passing to sink type functions.

I hope it helps.

What are your strategies for limiting usage of new/delete?

On Toggle Parameters

$
0
0

On Toggle parameters

I got very interested in one topic that recently appeared on on Andrzej’s Blog: Toggles in functions. I though that maybe is worth to express my opinion as a separate blog post.
Please take a look.

Intro

As Andrzej wrote in the article the whole point is how to improve the code around functions like:

RenderGlyphs(true,false,true,false);

We’d like not only to have more expressive code but to have safer code. What if you mix two parameters and change their order? The compiler won’t help you much!

We could add comments:

RenderGlyphs(glyphs,
/*useChache*/true,
/*deferred*/false,
/*optimize*/true,
/*finalRender*/false);

And although the above code is a bit more readable, still we don’t get any more safety.

So what can we do more?

Ideas

Here are some ideas that you can use to make such code better:

Small Enums

In theory we could write the following declarations:

enumclassUseCacheFlag{True,False};
enumclassDeferredFlag{True,False};
enumclassOptimizeFlag{True,False};
enumclassFinalRenderFlag{True,False};

// and call like:
RenderGlyphs(glyphs,
UseCacheFlag::True,
DeferredFlag::False,
OptimizeFlag::True,
FinalRenderFlag::False);

Using enums is a great approach but have some disadvantages:

  • A lot of additional names required!
    • Maybe we could reuse some types, should we have some common flags defined in the project? how to organize those types?
  • Values are not directly convertible to bool, so you have to compare against Flag::True explicitly inside the function body.

The required explicit comparison was the reason Andrzej wrote his own little library that creates toggles with conversion to bool.

Initially, I though it’s a disappointing that we don’t have direct support from the language. But after a while, I changed my mind. The explicit comparison is not that hard to write, so maybe it would be overkill to include it in the language spec? Introducing explicit casts might even cause some problems.

Still, I am not quite happy with the need to write so many tiny enums… And since I am lazy I probably won’t apply such rule to all of my code :)

Param Structure

If you have several parameters (like 4 or 5, depends on the context) why don’t wrap them into a separate structure?

structRenderGlyphsParam
{
bool useCache;
bool deferred;
bool optimize;
bool finalRender;
};
voidRenderGlyphs(Glyphs&glyphs,constRenderGlyphsParam&renderParam);

// the call:
RenderGlyphs(glyphs,
{/*useChache*/true,
/*deferred*/false,
/*optimize*/true,
/*finalRender*/false});

OK… this didn’t help much! You get additional code to manage, and the caller uses almost the same code.

So why could it help?

  • It moves the problem to the other place. You could apply strong types to individual members of the structure.
  • If you need to add more parameters then you can just extend the structure.
  • Especially useful if more functions can share such param structure.

BTW: you could put the glyphs variable also in the RenderGlyphsParam, this is only for example.

Eliminate

We could try to fix the syntax and use clever techniques. But what about using a simpler method? What if we provide more functions and just eliminate the parameter?

It’s ok to have one or maybe two toggle parameters, but if you have more maybe it means a function tries to do too much?

In our simple example we could try:

RenderGlyphsDeferred(glyphs,
/*useChache*/true,
/*optimize*/true);
RenderGlyphsForFinalRender(glyphs,
/*useChache*/true,
/*optimize*/true;

We can make the change for parameters that are mutually exclusive. In our example, deferred cannot happen together with the final run.

You might have some internal function RenderGlyphsInternal that would still take those toggle parameters (if you really cannot separate the code). But at least such internal code will be hidden from the public API. You can refactor that internal function later if possible.

So I think it’s good to look at the function declaration and review if there are mutually exclusive parameters. Maybe the function is doing too much? If yes, then cut it into several smaller functions.

After writing this section I’ve noticed a tip from Martin Fowler on Flag Arguments. In the text he also tries to avoid toggles.

Youo can also read this article from Robert C. Martin’s Clean Code Tip #12: Eliminate Boolean Arguments. And more in his book Clean Code: A Handbook of Agile Software Craftsmanship

What’s in future C++2z?

There’s a paper: Designated Initialization, P0329R0 that might go into C++20.

Basically you could use similar approach as in C99 and name arguments that you pass to a function:

copy(.from= a,.to = b);

There’s even a CLang implementation already, see this: Uniform designated initializers and arguments for C++.

Stronger Types

Using small enums or structures is a part of a more general topic of using Stronger Types. Similar problems might appear when you have several ints as parameters or strings…

You can read more about:

One example

Recently, I had a chance to apply some ideas of enum/stronger types to my code. Here’s a rough outline:

// functions:
boolCreateContainer(Container*pContainer,bool*pOutWasReused);

voidProcess(Container*pContainer,bool bWasReused);

// usage
bool bWasReused =false;
if(!CreateContainer(&myContainer,&bWasReused))
returnfalse;

Process(&myContainer, bWasReused);

Briefly: we create a container, and we process it. But the container might be reused (some pool, reusing existing object, etc., some internal logic).

I thought that it doesn’t look nice. We use one output flag and then it’s passed as input flag to some other function.

What’s more, we pass a pointer, so some additional validation should happen. Also, the output parameters are discouraged in modern C++, so it’s not good to have them anyway.

How can we do better?

Let’s use enums!

enumclassContainerCreateInfo{Err,Created,Reused};
ContainerCreateInfoCreateContainer(Container*pContainer);

voidProcess(Container*pContainer,ContainerCreateInfo createInfo);

// usage
auto createInfo =CreateContainer(&myContainer)
if(createInfo ==ContainerCreateInfo::Err);
returnfalse;

Process(&myContainer, createInfo);

Isn’t it better?

There are no outputs via pointer stuff here; we have a strong type for the ‘toggle’ parameter.

Also, if you need to pass some more information in that CreateInfo enum you can just add one more enum value and process it in proper places, the function prototypes don’t have to change.

Of course in the implementation, you have to compare against enum values (not just cast to bool), but itis not difficult and even more verbose.

Summary

By reading the original article from Andrzej and this additional few words from me I hope you get the idea about toggle type parameters. They are not totally wrong, and it’s probably impossible to avoid them entirely. Still, it’s better to review your design when you want to add third or fourth parameter in a row :)
Maybe you can reduce number of toggles/flags and have more expressive code?

Do you try to refactor toggle parameters?
Do you use strong types in your code?

C++ Jobs and Predictions

$
0
0

C++ jobs and predictions

There are probably billions of lines of code written in C++ already. New code is being written every day. But will this trend continue? Will you be able to find a C++ job in five years?
Let's have a quick view.

The Story

This post was motivated by a recent video from J. Sonmez, you can see it here: Does C++ Have a Future?

Briefly, John explained that although he loves/loved C++, he thinks that if you’re just starting you shouldn’t invest much in C++. It’s good to know C++ (since it gives you a lot of knowledge about underlying hardware, native code, etc.), but still, from a career point of view, there are better options at the moment.

To be clear, he also mentioned that if you’re a C++ guy already, there’s nothing to worry because there will be still jobs for you. Even in 50 years there might be some Cpp code laying around somewhere :)

My view

In my opinion, it’s not that bad with C++! Or at least I hope so :)

Where C++ is used?

First of all, let’s look where C++ is used. Just by looking at Bjarne Stroustrup’s page on C++ applications we can see that there are a lot of apps our there!

Adding my ideas, I could write that C++ is used almost everywhere:

  • Computer games, game engines,
  • Audio libraries,
  • CAD/3D - like Autodesk, Maya, 3d studio max, Blender, etc
  • Document editors (Adobe products, Xara, Office)
  • Flight planning: Amadeus, Sabre
  • large scale e-commerce at Amazon
  • Google - various projects (search, Chromium browser, …)
  • Operating systems: lots of languages usually used, but C++ and C are used for the core parts.
  • Drivers
  • Financial: Bloomberg for example, HPT (Hight Perf Trading) platforms
  • Science: like Cern or NASA
  • Compilers
  • Programming tools: like
  • Communication protocols, systems (like from Ericsson)
  • Facebook
  • HP (like Java core)
  • Intel
  • plus a recent r/cpp discussion: Why use cpp other than performance?

Ok, we could list and list companies and products here for a long time.

Basically, from a low level system, drivers, to whole operating systems, game engines, games, high perf trading, scientific computation, flight planning, document editors…

As you can see C++ is not only in backed/perf code but also in UI - full system stack.

Also, in the mentioned systems, C++ might not be used as the only language. Sometimes it’s used in 99% of the code, in others maybe only 20%.

We also have to remember about legacy code that’s already there. Someone has to maintain it and add new features.

Any advantages?

What are the main benefits:

  • Modern C++ - feels like a new language. Have a look at C++11/14/17 and the future - C++20 will bring even more great stuff.
  • RAII - without garbage collector, you can clean your objects nicely!
  • Performance and memory efficiency - you can optimize down to the hardware level.
    • BTW: I’ve seen a good about perf: “c++ does not give you performance, it gives you control over performance (Chandler Carruth)”
  • Native, close to the metal, but still expressive and relatively clean. You can open the hood if you like, or stay and use higher level abstractions.
  • Multi-paradigm - you’re not forced to use only OOP, you can mix different styles depending on the needs.
  • Deterministic, well defined
  • Templates
  • Portability - there’s a C++ compiler probably for every platform!
  • Integration - you can bind it with other languages, systems. For example easy to use with C#/Java for backend/perf code.
  • Tools are getting better and better.
    • Especially thanks to many clang based tools!

And of course we have some little disadvantages, but let’s forget about them today :)

The language itself is growing, the community is amazing, new language standards appears regularly (you can take a look at my recent article: How To Stay Sane with Modern C++.

Also, by looking at the use of C++ and also it’s popularity (f.e. Tiobe’s 3rd place for a long time). Assuming you prefer system, “lower level” areas, I think that learning/starting with C++ is a good idea. Even if you go in other direction, the knowledge of C++ is universal and will give you a huge boost when learning other languages (like C#, Java, Go…).

Anything better that C++?

Can you find some better languages/platforms?

Yes. Especially if you want to build web applications it’s better to learn JavaScript or Ruby. Also, C# and Java are big players and should give you a lot of options and available jobs.

In the field of native languages we have D, Rust, Go… so it’s good to have an eye on them as well!

What to learn then?

Should you learn only C++ and nothing else (especially when you start)? Definitely NOT!

When you just start it’s better to have more choices. You should try several areas and after some time pick your favorite. You can learn C++ but also C# and see where it’s easier to create an UI (hmm… I wonder what’s the answer here :)). But at the same time, you can try Python or Java Script and compare the performance of some low-level code (if that’s possible in Python/JS…).

For some areas, C++ will be better. For some won’t. Moreover, it often happens that an application is built using several languages/technologies. So if you know, more you can help in more areas of the development.

But I am a C++ guru already!

Ok, but what if the job market shrinks? It’s not good to put all eggs in one basket, so I would also suggest learning something new from time to time.

It’s easier said than done, but learning something new - like a new lib, language, paradigm, etc… might have huge benefits on your existing code.

Job market?

Ok, let’s be practical now: What’s the job market for C++?
For example, in Cracow I have lots of options in Banking, Telecom, Flight planning, Gamedev, Enterprise Systems, Cars, Embedded… plus you can also find remote jobs like me :)

To be honest, I think it’s quite stable over the last seven years I think. Maybe it’s even growing a bit, since more tech companies have come to Cracow recently.

But let’s see what’s opinion from others:

Clearly, it’s not that awesome, mostly stable, and you probably have more options for C# and Java, Java Script… however, it’s not decreasing super fast.

Summary

Stats, slow decrease in popularity

C++ is a solid language and continues to be so.

I am not saying that C++ is fantastic and you should abandon everything and just stick to it. If you’re just starting it’s good to have more options and learn different things. Even if you’re a C++ guy already, it’s vital to learn something new and improve.

Still, C++ jobs won't disappear overnight. C++ is a solid language and continues to be so. I expect the situation in job market to be stable with some possibility to slowly decrease over the years. But if you like this area you'll be able to find a C++ job anyway. I hope C++20 will add another good reason to stick with C++ and even move from other languages... but we need to wait a few years to see it happening.

Just in case you're learning about upcoming C++17, you can grab my one-page RefCard for the language features: link here.

  • Let me know what's your opinion about the future of C++.
  • Do you worry about the job market for this language?
  • How does it look like in your area?
  • What other languages are you learning now?

Windows File Tests

$
0
0

Transform a file on Windows

You want to transform one file into another, input into output. What API will you choose on Windows? WinApi? C++ Streams or good old stdio?

Last year in September I looked at four ways of processing a file on Windows. Also, I did some performance tests.
The whole project description was recently published in Visual Studio Magazine.

The idea was relatively straightforward: I’d like to transform data from one file and write that into another file. The transformation method wasn’t important (it could be just a copy or encryption). I was interested how can you achieve that using Apis: C++ streams, C stdio, WinApi basic and WinApi memory mapped files.

After I had built the whole processing code, I was able to test the performance. What API was the fastest? How do you think? What was the easiest to use?

The articles:

And there’s also GitHub repo with all sources: fenbf/WinFileTests

In the future it might be worth to try using multiple threads and see what are the benefits.

Please have a look at the articles and let me know what you think.

C++18 Next Year!

$
0
0

C++18 Next Year

I have a great news! During the last meeting in Kona, the committee not only made final notes on the C++17 standard! There’s also a groundbreaking news that behind the curtains they planned C++18!

Intro

As it appears, the C++ Cometee finally understood that C++17 doesn’t contain the features everyone wanted. At the moment It’s impossible to change the standard - since it was already sent to the final ISO balloting… but during the last meeting in Kona (March), they decided to put all the efforts in the preparation of C++18!

C++18 will basically contain all the favourite features that programmers expected: So what we’ll get:

  • Modules!
  • Concepts
  • Ranges
  • Co-routines
  • Contracts
  • Possibly transactional memory

That’s really huge list of features!

In fact, most of the features are available in compilers already, as experimental features. So all we have to do during this year is to accept all the differences in the implementation, agree on the final scope, make the final wording… and wrap in the C++18 Standard.

Just to make it clear:

Modules

There are already two implementations that seems to work well: one from clang and one from Microsoft.

You can play with them here: Clang, Visual Studio .

Concepts

As we know, concepts (concepts-lite to be correct) are already available in GCC: link here.

A few days ago Gabriel Dos Reis announced - “Concepts are ready”!. See this PDF: P0606R0, Concepts Are Ready.

They are published as: ISO/IEC TS 19217:2015
Information technology – Programming languages – C++ Extensions for concepts
.

Ranges

Containers Redesigned!

Already working implementation can be found on github, from Microsoft: link here. Works since Visual Studio 2015 update 3.

Co-routines

Co routines in Visual Studio: link here.

Contracts

Current proposal can be found here - P0542R0.

How to write preconditions and postconditions for functions.

Transactional memory

It’s already published as C++ extension: ISO/IEC TS 19841:2015
Technical Specification for C++ Extensions for Transactional Memory

So we just have to merge it into C++18.

Summary

I was really excited when I first heard that information! I’ve noticed that people complained that C++17 is not a major release and a lot of great features won’t be present. With C++18 we can fix this issue! Instead of waiting another three years (for C++20), we’ll get all the the best features in just one year from now.

  • What do you think about C++18?
  • Will the committee complete that on time?
  • What features would you like to see in C++18?

C++18, Why not?

$
0
0

C++18 experiments

As you’ve might already notice I’ve made a little joke on Saturday, which was the April Fools’ Day. I got the courage to announce C++18 next year! :)
While it was a bit funny, I didn’t expect much traffic (as it was Saturday). Still, my stats shows that a lot of people clicked and viewed the post. Thanks!

Today I’d like to continue the topic: why not and have C++18?

The story

Here are some of the comments:

NOPI: I know it’s a april fools joke but gosh I could wish that were actually true.

or

mps1729: Please, let this not be an April fool’s day post! I can dream, right?

and

sail0rm00n: i was really excited then i looked at the date :(

My fake news wasn’t that off the ground, as most of the features are very close to be accepted by the committee. Some of the features are already completed!

So, in fact, next year you can almost start using C++18: just take an experimental feature and play with it. Of course, you can start even now… no need to wait another year :)

The features

Let’s recall what the features of my C++18 was:

Modules

There are already two implementations that seem to work well: one from clang and one from Microsoft.

You can play with them here: Clang, Visual Studio .

Concepts

As we know, concepts (concepts-lite to be correct) are already available in GCC: link here.

A few days ago Gabriel Dos Reis announced - “Concepts are ready”!. See this PDF: P0606R0, Concepts Are Ready.

They are published as: ISO/IEC TS 19217:2015
Information technology – Programming languages – C++ Extensions for concepts
.

Ranges

Already working implementation can be found on Github, from Microsoft: link here. Works since Visual Studio 2015 update 3.

Jonathan Boccara recently wrote an excellent introduction to Ranges, so you might want to have a look: Ranges: the STL to the Next Level - Fluent C++.

And also please follow one of the Ranges author: Eric Niebler.

Co-routines

Co routines in Visual Studio: link here.

James McNellis has a lot of talks about co-routines so check this out: CppCon 2016: “Introduction to C++ Coroutines”

Contracts

Current proposal can be found here - P0542R0.

How to write preconditions and postconditions for functions.

Transactional memory

It’s already published as C++ extension: ISO/IEC TS 19841:2015
Technical Specification for C++ Extensions for Transactional Memory

So we just have to merge it into C++18.

Transactional memory - cppreference.com
TransactionalMemory - GCC Wiki

Summary

Which one is your favourite? Let’s answer the quick survey:

While we can complain about the lack of the features in C++17 there’s also another option: since the features are almost done why not use them? I doubt your production code can be immediately upgraded to C++20 (when it’s out), the transition period is required. By experimenting you’ll get two things at least: you’ll learn something new, and the second point: you’ll understand if a given feature could work in your project.

Beautiful code: final_act from GLS

$
0
0

Code function

Sometimes there’s a need to invoke a special action at the end of the scope: it could be a resource releasing code, flag set, code guard, begin/end function calls, etc. Recently, I’ve found a beautiful utility that helps in that cases.
Let’s meet gsl::final_act/finally.

Intro

Imagine we have the following code:

void addExtraNodes();
void removeExtraNodes();

boolScanner::scanNodes()
{
// code...
addExtraNodes
();

// code...
removeExtraNodes
();
returntrue;
}

We have a bunch of objects that scanNodes scans (global or shared container), but then we need to add some extra nodes to check. We want to preserve the initial container state, so at the end, we’re required to remove those additional nodes.

Of course, the design of the whole scan code could be much better so that we work on a copy of the container and adding or removing extra stuff would not be a problem. But there are places, especially in legacy code, where you work on some global container, and special care needs to be taken when changing it. A lot of bugs can happen when you modify a state, and someone expects a different state of the shared container.

My code seems to be working as expected… right? I call removeExtraNodes at the end of the function.

But what if there are multiple returns from scanNodes? It’s simple: we need to add multiple calls to removeExtraNodes. Ok….

What if there are some exceptions thrown? Then we also need to call our cleanup function before we throw…

So it appears we need to call removeExtraNodes not only before the last return!

Help needed

Let’s look at the C++ Core Guidelines. They suggest doing the following thing:

E.19: Use a final_action object to express cleanup if no suitable resource handle is available

The guideline says that we should strive for a better design, but still, it’s better than goto; exit approach, or doing nothing.

Ok… but what’s the solution here:

boolScanner::scanNodes()
{
// code...
addExtraNodes
();
auto _ = finally([]{ removeExtraNodes();});

// code...

returntrue;
}

What happened here?

All I did was to wrap the call to removeExtraNodes in a special object that will call a given callable object in its destructor. This is exactly what we need!

Where can we find that magical finally() code?

Just see Guideline Support Library/gsl_util.h.

Under the hood

The code is short, so I can even paste it here:

template<class F>
class final_act
{
public:
explicit final_act(F f) noexcept
: f_(std::move(f)), invoke_(true){}

final_act
(final_act&& other) noexcept
: f_(std::move(other.f_)),
invoke_
(other.invoke_)
{
other
.invoke_ =false;
}

final_act
(const final_act&)=delete;
final_act
&operator=(const final_act&)=delete;

~final_act() noexcept
{
if(invoke_) f_();
}

private:
F f_
;
bool invoke_;
};

Isn’t that beautiful?!

The above class takes a callable object - f_ - and then it will call it when it’s about to be destroyed. So even if your code returns early or throws an exception your cleanup code is required to be invoked.

To work nicely with move semantics, there has to be an additional boolean parameter invoke_. This will guarantee that we won’t call the code for temporary objects. See this commit for more information if needed:
Final_act copy/move semantics is wrong
.

Additionally, to make our life easier, we have function helpers that create the objects:

template<class F>
inline final_act<F> finally(const F& f) noexcept
{
return final_act<F>(f);
}

template<class F>
inline final_act<F> finally(F&& f) noexcept
{
return final_act<F>(std::forward<F>(f));
}

So all in all, we can use finally() function in the client code. Maybe that could change in C++17 as we’ll get Template argument deduction for class templates.

What’s nice about this code?

  • Clean, simple code
  • Expressive, no comments needed
  • Does one thing only
  • It’s generic, so works on anything that’s callable
  • Modern C++: so supports move semantics, noexcept,

Where could be used?

Just to be clear: don’t use finally approach too often! With the proper design, your objects shouldn’t work on a global state and take benefit from RAII as much as possible. Still, there are situations where finally is nice to use:

  • begin/end functions - where you’re required to call end after something started. As in our example.
  • flag setters. You have a shared flag, and you set it to a new state, but you have to reset it to the old state when you’re done.
  • resources without RAII support. The guideline shows an example with malloc/free. If you cannot wrap it in an RAII object (for example by using smart pointers and custom deleters), final_act might work.
  • safely closing the connection - as another example for resource clean-up in fact.

Do you see other places where final_act can help?

You can also look at this list: C++ List of ScopeGuard that appeared some time on Reddit (thread here)

Summary

final_act/finally is a beautiful and well-designed tool that can help with a dirty job of cleaning stuff. In your code, you should go for a better approach to clean things/resources, but if that’s not possible final_act is a great solution.

Do you use similar classes to clean things in your code?

final_act - follow-up

$
0
0

final_act follow-up

Last time I wrote about final_act utility, and it seems I’ve opened a bit bigger box that I’ve previously assumed. Let’s continue with the topic and try to understand some of the problems that were mentioned in the comments.

Intro

Let’s remind what was the case last time:

I want to call a custom cleanup code at the end of the scope, and I want to be sure it’s invoked.

boolScanner::scanNodes()
{
// code...
addExtraNodes
();
auto _ = finally([]{ removeExtraNodes();});

// code...

returntrue;
}

I’ve used finally() from GSL that internally works on final_act object.

The most important thing!

OK, I know… I made a typo in the title of my original post! :)
I tried it several times, sent newsletter with the proper name… but the post was wrong :)

GSL -> Guideline Support Library, not GLS -> Guideline Library Support

Important use case

Last time I forgot to mention one huge case where all of those scope_exit/final_act stuff might be utilized.

I mean: transactions. That’s a general term for all of the actions that should be reverted when something fails. If you copied 95% of a file and got an error, you cannot leave such possibly corrupted file; you have to remove it and maybe start again. If you connected to a database and you want to write some records, you assume it’s atomic. I think this idea was ‘hidden’ somewhere in my examples, but it should be more exposed.

So whenever you’re dealing with code that has to be atomic, and transactional, such code constructs might be helpful. Sometimes you can wrap it in a RAII; often explicit code needs to be used.

No exceptions

First of all, my initial assumption was to use final_act in an environment where there are not many exceptions. For example, a lot of legacy code doesn’t use exceptions. Also Google C++ coding guideline doesn’t prefer exceptions (for practical reasons). This is a strong assumption, I know, maybe I did this automatically :)

Without exception handling around, we need to take care only of early returns. In that context, final_act works as expected.

With exceptions

OK… so what are the problems with exceptions then? final_act will work in most cases, so don’t just drop it whenever you have a code with exceptions… but we need to carefully look at some delicate parts here.

First thing: final act is noexcept

As explained many times through the comments in GSL repo (for example here), other issues

And from Final_act can lead to program termination if the final act throws an exception:

Final_act should be noexcept. It is conceptually just a handy way for the user to conjure up a destructor, and destructors should be noexcept. If something it invokes happens to throw, then the program will terminate.

In other words you should write the code that will be called with the same assumptions as other destructor code… so don’t throw anything there. That might be a little limitation when you want to call some ‘normal’ code, not just some clean-up stuff (on the other hand might that would be a bad design after-all?).

I’ve just notices a really great explanation why destructors shoudn’t throws:

from isocpp.org/faq

Write a message to a log-file. Terminate the process. Or call Aunt Tilda. But do not throw an exception!

Throwing from ctor or copy ctor

There’s a long-standing bug in the current implementation:

throwing copy and move constructors cause final_act to not execute the action · Issue #283 · Microsoft/GSL

How to workaround the bug?

We’re looking at this code:

explicit final_act(F f) noexcept 
: f_(std::move(f))
, invoke_(true)
{
}

final_act
(final_act&& other) noexcept
: f_(std::move(other.f_))
, invoke_(other.invoke_)
{
other
.invoke_ =false;
}

And especially those f_(std::move(other.f_)) calls.

The problem will occur if we raise an exception from the move/copy constructor. As I see this, it can happen only with custom move code that we have for the callable object. We should be safe when we use only lambdas as in:

auto _ = finally([] { removeExtraNodes(); });

Since lambdas (update: with no params) will have default code that won’t throw.

So maybe it’s not a major limitation?

update: I missed one thing. Take look at the example provided in the comment at r/cpp. An exception can also be thrown from a copy/move constructor from some argument of the lambda object (since lambdas are 'internally' represented as functor objects and their params are members of that functor). Still, this is probably a quite rare case.

Still, if you plan to use some advanced/custom callable functors, with special move code then it might be good to take something different than final_act.

Other solutions

To be honest, I also assumed that since final_act is proposed in Core Guidelines, then it’s the best choice that we have in Modern C++! But apparently we have some other possibilities:

The talk

First of all please watch this:

CppCon 2015: Andrei Alexandrescu “Declarative Control Flow”

The paper

And read that:

PDF, P0052R3 - Generic Scope Guard and RAII Wrapper for the Standard Library

Roughly, the plan is to have (C++20?) a set of tools:

  • std::scope_exit
  • std::scope_success
  • std::scope_fail

scope_exit is meant to be a general-purpose scope guard that calls its exit function when a scope is exited. The class templates scope_fail and scope_success share the scope_exit’s interface, only the situation when the exit function is called differs. These latter two class templates memorize the value of uncaught_exceptions() on construction and in the case of scope_fail call the exit function on destruction, then uncaught_exceptions() at that time returns a greater value, in the case of scope_success when uncaught_exceptions() on destruction returns the same or a lesser value.

this assumes uncaught_exceptions() returns int not just bool.

folly/ScopeGuard.h

There’s already working code

folly/ScopeGuard.h - master

D Language

In D we have built-in support for such structures:

scope(exit) removeExtraNodes();

see here for some examples Dlang: Exception Safety

Copy elision

The existing code works now and doesn’t rely on Guaranteed Copy Elision that we’ll have in C++17. In order to support this they have to introduce that special bool parameter.

See discussion in Final_act copy/move semantics is wrong

Summary

As it appears final_act is a simple utility that should work well in case where your exit code doesn’t throw exceptions (and also doesn’t throw from copy/move constructors!). Still, if you need some more advanced solutions you might want to wait for general std::scope_exit/_success/_fail utilities.

One of the most important use case is whenever we need transactional approach with some actions. When we require to call some clean-up code after it succeeded or failed.

Meta-blogging-opinion: The beauty of blogging is that often you write about one topic and you unravel (for yourself) a whole new areas. That way blogging is a great way of learning things!

BTW: as a homework you can write a macro FINALLY that wraps the creation of the auto variable and makes sure we have a different name for that variable - so that you might have several final blocks in a function/scope.

Packing Bools, Performance tests

$
0
0

Packing booleans, bits

Imagine you have an array of booleans (or an array of ‘conditions’), and you want to pack it - so you use only one bit per boolean. How to do it? Let’s do some experiments!

Motivation

I started writing this post because I came across a similar problem during my work some time ago. The code in one area of our system packed boolean results of a condition into bits. I wondered if I could optimize that process. This ‘algorithm’ is not a rocket science, but as usually, it opened a whole box of details and interesting solutions. So I decided to share it with my readers.

To illustrate the problem, we might think about an image in greyscale. We want to generate another image that has only two colors: white or black; we use a threshold value to distinguish between white and black color from the input image.

outputColor[x][y] = inputColor[x][y] > Threshold;

The input has some integer range (like 0…255), but the output is boolean: true/false.

Like here, image thresholding:

Image thresholding

Then we want to pack those boolean values into bits so that we save a lot of memory. If bool is implemented as 8bit unsigned char, then we can save 7/8 of memory!

For example, instead of using 128kb for 256x512 greyscale image, we can now use 16kb only.

256 X 512 = 131072 (bytes) = 128kb
131072/8 = 16384 (bytes) = 16kb

Should be simple to code… right?

The algorithm

To make things clear let’s make some initial assumptions:

  • input:
    • array of integer values
    • length of the array: N
    • threshold value
  • output:
    • array of BYTES of the length M
    • M - number of bytes needed to write N bits
    • i-th bit of the array is set when inputArray[i] > threshold.

Brief pseudo code

for i = 0...N-1
byte = pack (input[i] > threshold,
input[i+1] > threshold,
...,
input[i+7] > threshold)
output[i/8] = byte
i+=8

// handle case where N not divisible by 8

Alternatively, we might remove the threshold value and just take input array of booleans (so there won’t be any need to make comparisons).

Drawbacks of packing

Please note that I only focused on the ‘packing’ part. With the packed format you save memory, but there are more instructions to unpack a value. Sometimes this additional processing might cause the slow-down of the whole process! Always measure measure measure because each case might be different!

This problem is similar to compression algorithms, although packing is usually much faster process. As always, there’s a conflict between the storage and the computation power (Space–time tradeoff).

The benchmark

I want to compare several implementations:

  • std::bitset
  • std::vector of bools
  • one ‘manual’ version
  • second ‘manual’ version

Plus, the next time we’ll also add parallel options…

For the benchmarking library, I decided to use Celero. You can find more details about using it in my post about Benchmarking Libs for C++.

With Celero there’s an easy way to express different options for the benchmark. So for example, I’d like to run my code against different sizes of the input array: like 100k, 200k, … Also, there’s a clean way to provide setUp/tearDown methods that will be invoked before each run.

The base fixture provides input array:

inputValues.reset(newint[N]);
referenceValues
.reset(newbool[N]);
arrayLength
= N;

//Standard mersenne_twister_engine seeded with 0, constant
std
::mt19937 gen(0);
std
::uniform_int_distribution<> dist(0,255);

// set every byte
for(int64_t i =0; i < experimentValue;++i)
{
inputValues
[i]= dist(gen);
referenceValues
[i]= inputValues[i]>ThresholdValue;
}

std::bitset<N>

OK, this version will be really simple, take a look:

for(int64_t i =0; i < arrayLength;++i)
outputBitset
.set(i, inputValues[i]>ThresholdValue);

The only drawback of using bitset is that it requires compile time N constant. Also, bitset is implementation specific, so we’re not sure how the memory is laid out internally. I would reject this version from the final production code, but it’s useful as a baseline.

For example, here’s the fixture for this baseline benchmark:

classStdBitsetFixture:publicCompressBoolsFixture
{
public:
virtualvoid tearDown()
{
for(int64_t i =0; i < arrayLength;++i)
Checker(outputBitset[i], referenceValues[i], i);
}

std
::bitset<MAX_ARRAY_LEN> outputBitset;
};

In tearDown we check our generated values with the reference - Checker just checks the values and prints if something is not equal.

std::vector<bool>

Another simple code. But this time vector is more useful, as it’s dynamic and the code is still super simple.

for(int64_t i =0; i < arrayLength;++i)
outputVector
[i]= inputValues[i]>ThresholdValue;

And the fixture:

classStdVectorFixture:publicCompressBoolsFixture
{
public:
virtualvoid setUp(int64_t experimentValue) override
{
CompressBoolsFixture::setUp(experimentValue);

outputVector
.resize(experimentValue);
}

virtualvoid tearDown()
{
for(int64_t i =0; i < arrayLength;++i)
Checker(outputVector[i], referenceValues[i], i);
}

std
::vector<bool> outputVector;
};

This time, we generate the vector dynamically using experimentValue (N - the size of the array).

Still, vector<bool> might not be a good choice for the production code; see 17.1.1 Do not use std::vector | High Integrity C++ Coding Standard.

Manual version

The first two versions were just to start with something, let’s now create some ‘real’ manual code :)

I mean ‘manual’ since all the memory management will be done but that code. Also, there won’t be any abstraction layer to set/get bits.

The setup looks like this:

virtualvoid setUp(int64_t experimentValue) override
{
CompressBoolsFixture::setUp(experimentValue);
numBytes
=(experimentValue +7)/8;
numFullBytes
=(experimentValue)/8;
outputValues
.reset(newuint8_t[numBytes]);
}

outputValue is just a unique_ptr to array of uint8_t. We have N/8 full bytes and also there’s one at the end that might be partially filled.

The first case will use just one variable to build the byte. When this byte is complete (8 bits are stored), we can save it in the output array:

uint8_tOutByte=0;
int shiftCounter =0;

auto pInputData = inputValues.get();
auto pOutputByte = outputValues.get();

for(int64_t i =0; i < arrayLength;++i)
{
if(*pInputData >ThresholdValue)
OutByte|=(1<< shiftCounter);

pInputData
++;
shiftCounter
++;

if(shiftCounter >7)
{
*pOutputByte++=OutByte;
OutByte=0;
shiftCounter
=0;
}
}

// our byte might be incomplete, so we need to handle this:
if(arrayLength &7)
*pOutputByte++=OutByte;

Improvement

The first manual version has a little drawback. As you see, there’s only one value used when doing all the computation. This is quite inefficient as there’s little use of instruction pipelining.

So I came up with the following idea:

uint8_tBits[8]={0};
constint64_t lenDivBy8 =(arrayLength /8)*8;

auto pInputData = inputValues.get();
auto pOutputByte = outputValues.get();

for(int64_t i =0; i < lenDivBy8; i +=8)
{
Bits[0]= pInputData[0]>ThresholdValue?0x01:0;
Bits[1]= pInputData[1]>ThresholdValue?0x02:0;
Bits[2]= pInputData[2]>ThresholdValue?0x04:0;
Bits[3]= pInputData[3]>ThresholdValue?0x08:0;
Bits[4]= pInputData[4]>ThresholdValue?0x10:0;
Bits[5]= pInputData[5]>ThresholdValue?0x20:0;
Bits[6]= pInputData[6]>ThresholdValue?0x40:0;
Bits[7]= pInputData[7]>ThresholdValue?0x80:0;

*pOutputByte++=Bits[0]|Bits[1]|Bits[2]|Bits[3]|
Bits[4]|Bits[5]|Bits[6]|Bits[7];
pInputData
+=8;
}
if(arrayLength &7)
{
autoRestW= arrayLength &7;
memset
(Bits,0,8);
for(longlong i =0; i <RestW;++i)
{
Bits[i]=*pInputData ==ThresholdValue?1<< i :0;
pInputData
++;
}
*pOutputByte++=Bits[0]|Bits[1]|Bits[2]|Bits[3]|Bits[4]|Bits[5]|Bits[6]|Bits[7];
}

What happened here?

Instead of working on one variable I used eight different variables where we store the result of the condition. However, there’s still a problem when doing that large OR. For now, I don’t know how to improve it. Maybe you know some tricks? (without using SIMD instructions…)

Results

Was I right with this approach of using more variables? Let’s see some evidence!

Intel i7 4720HQ, 12GB Ram, 512 SSD, Windows 10.

performance results, Celero, packing bools

The optimized version (using separate variables) is roughly 5x faster that bitset and almost 3.5x faster than the first manual version!

The chart:

performance results, Celero, packing bools, chart

Summary

The experiment was not so hard… so far! It was a good exercise for writing benchmarks.

What can we see from the results? The first manual version was still better than the std::bitset, but worse than std::vector. Without looking at the traces, profiles I optimized it by using more variables to compute the conditions. That way there was less data dependency and CPU could perform better.

Next time I’ll try to parallelize the code. How about using more threads or vector instructions? For example, I’ve found a really interesting instruction called: _mm_movemask_epi8… See you next week.

code on github: fenbf/celeroTest/celeroCompressBools.cpp

Packing bools, Parallel and More

$
0
0

Packing bools, performance tests, parallel

Let’s continue with the topic of packing boolean arrays into bits. Last time I’ve shown a basic - single threaded version of this ‘super’ advanced algorithm. By using more independent variables, we could speed things up and go even faster than no packing version! We’ve also used std::vector and std::bitset. Today I’d like to look at making the task parallel.

Read the first part here: Packing Bools, Performance tests

Recall

Just to recall, there’s an array of values and a threshold value. We want to test input values against that Threshold and store boolean condition results into bits.

Brief pseudo code

for i = 0...N-1
byte = pack (input[i] > threshold,
input[i+1] > threshold,
...,
input[i+7] > threshold)
output[i/8] = byte
i+=8

// handle case where N not divisible by 8

In other words, we want to pack boolean results:

true, false, true, false, true, false, true, true

into full byte

11010101

where the first value corresponds to the first bit of the byte.

Simd, SSE2

The improved version of the solution uses eight separate values to store the result of the comparison and then it’s packed into one byte. But with SIMD we could do even more. There’s a way to pack 16 values at once using only SSE2 instructions. Can this be faster?

The core part of this approach is to use _mm_movemask_epi8. As we can read here:

int _mm_movemask_epi8 (__m128i a)

Create mask from the most significant bit of each 8-bit element in a,
and store the result in dst.

Since the comparison instructions set value 0xFF or 0, the above code is perfect to do the packing.

So the code can look like this:

auto in16Values = _mm_set_epi8(/*load 16 values*/);
auto cmpRes = _mm_cmpgt_epi8(in16Values, sseThresholds);
// cmpRes will stores 0xFF or 0 per each comparison result
auto packed = _mm_movemask_epi8(cmpRes);
*((uint16_t*)pOutputByte)=static_cast<uint16_t>(packed);

packed will be 16 bit mask composed from most significant bit of each 8-bit element in cmpRes. So this is exactly what we need.

The problem

Unfortunately, there’s a little problem. _mm_cmpgt_epi8 compares only signed byte values, so we need to do more work to support unsigned version.

There wouldn’t be any problem if we compared with the equality operator, but for greater than it’s not an option.

You can read more about missing SSE instruction in this article: A few missing SSE intrinsics BTW: Thanks @malcompl for letting me know on Twitter.

Implementation

Maybe it will be unfair, but to solve the signed/unsigned problem I just make a conversion code that subtracts 128 from the input values (and the threshold). So that conversion is not counted in the measurement.
In the end, you’ll see the reason for doing this.

Auto vectorization

What about auto-vectorization? Maybe I am a terrible programmer, but it seems that most of my loops are hard to make vectorized. You can try and enable auto-vectorization in Visual Studio. But every time I do this I get almost zero success and no vectorized loops. See MSDN Auto-Parallelization and Auto-Vectorization. Maybe it’s better in GCC/Clang?

Threading with OpenMP

So far the code was single-threaded. We should be leveraging all available cores on our machines. Even in typical user devices, there are two or more cores (sometimes plus hyper-threading).

I don’t want to create a sophisticated task-queue worker system, so I got one idea: what about OpenMP? Our problem is quite simple, and what’s most important: we can perform packing in a highly parallel manner, as there are almost no conflicts between packed bytes.

Visual Studio offers a simple switch that enables OpenMP 2.0. As far as I can see GCC offers almost the newest version (4.5), and Clang allows to use OpenMP 3.1.

BTW: why VS only offers OpenMP 2.0… why we cannot go higher? Other people complained, see this thread: Add support for OpenMP 4.5 to VC++ – Visual Studio

If you want to have a quick intro about OpenMP, I suggest this resource: Guide into OpenMP: Easy multithreading programming for C++.

Basically, OpenMP offers a fork-join model of computation:

Fork join model, wikipedia

The picture comes from wikipedia.

Our problem is perfect for such scenario. Theoretically, we could spread one thread per byte! So each byte packing would get its own thread. OK, maybe it’s not the best option as the overhead of thread switching would be much heavier than the computation itself, but I hope you get what I meant here.

What’s great about OpenMP is that it will handle all the hard part of threads management. All we have to do is to mark the parallel region and rewrite code in a way it’s easy to be run on separate threads.

So our version with OpenMP uses the following code

#pragma omp parallel forprivate(Bits)
for(int i =0; i < numFullBytes;++i)
{
auto pInputData = inputValues.get()+ i*8;
Bits[0]= pInput[0]>Threshold?0x01:0;
Bits[1]= pInput[1]>Threshold?0x02:0;
Bits[2]= pInput[2]>Threshold?0x04:0;
Bits[3]= pInput[3]>Threshold?0x08:0;
Bits[4]= pInput[4]>Threshold?0x10:0;
Bits[5]= pInput[5]>Threshold?0x20:0;
Bits[6]= pInput[6]>Threshold?0x40:0;
Bits[7]= pInput[7]>Threshold?0x80:0;

outputValues
.get()[i]=Bits[0]|Bits[1]|Bits[2]|Bits[3]|
Bits[4]|Bits[5]|Bits[6]|Bits[7];
}
// and then the part for handling the last not full byte...

All I had to do was to reorganize code a bit - starting from my not-depended version. Now each loop iteration works on one byte and 8 input values. We have a private section - Bits, that will be separate for each thread.

OpenMP will try to spread the work across available worker threads. Usually, it will be the number of cores. For example my machine has 4 cores with HT, so OpenMP reports 8 in my case (using omp_get_max_threads()).

Not bad as just one line of code?

OK, so I have probably 8 worker threads available… will my initial code perform 8x faster? Probably not, as we need to count additional API/Library overhead. But 2x or even more might easily happen.

Packed struct

David Mott made a comment, where he suggested using packed structs.

Why should we manually perform bit operations? Maybe we can force the compiler and get some help? Why not :)

struct bool8 
{
uint8_t val0 :1;
uint8_t val1 :1;
uint8_t val2 :1;
uint8_t val3 :1;
uint8_t val4 :1;
uint8_t val5 :1;
uint8_t val6 :1;
uint8_t val7 :1;
};

the processing code is much cleaner now:

for(int64_t j =0; j < lenDivBy8; j +=8)
{
out
.val0 = pInputData[0]>ThresholdValue;
out
.val1 = pInputData[1]>ThresholdValue;
out
.val2 = pInputData[2]>ThresholdValue;
out
.val3 = pInputData[3]>ThresholdValue;
out
.val4 = pInputData[4]>ThresholdValue;
out
.val5 = pInputData[5]>ThresholdValue;
out
.val6 = pInputData[6]>ThresholdValue;
out
.val7 = pInputData[7]>ThresholdValue;

*pOutputByte++= out;
pInputData
+=8;
}

The OR operation is completely hidden now (maybe even not needed as the compiler can do its magic).

The case for the last byte is not as clean, but also not that bad:

if(arrayLength &7)
{
autoRestW= arrayLength &7;
out
={0,0,0,0,0,0,0,0};
if(RestW>6) out.val6 = pInput[6]>Threshold;
if(RestW>5) out.val5 = pInput[5]>Threshold;
if(RestW>4) out.val4 = pInput[4]>Threshold;
if(RestW>3) out.val3 = pInput[3]>Threshold;
if(RestW>2) out.val2 = pInput[2]>Threshold;
if(RestW>1) out.val1 = pInput[1]>Threshold;
if(RestW>0) out.val0 = pInput[0]>Threshold;
*pOutputByte++= out;
}

We could also use union to provide array access for bits.

Results

Ok, here’s the final run, with all versions:

packing bools, all results

And the chart for all:

packing bools, all results, chart

Chart for versions performing better than no packing

packing bools, fast versions, chart

  • OpenMP is a great way to make things faster, we get around 2.5…3x better performance (although I have 8 available system threads…)
  • Using packed structs is a really good option: the bit playing code is hidden and the compiler is responsible to pack things. And it performs almost the same as manual version. Even faster for larger sets of data.
  • My simd version wasn’t perfect, but I was still hoping for more gains. It operates on 16 input values at once (as opposed to 8 values in other versions). But the perf was slower. I am not an expert of simd stuff, so maybe there’s a way to improve?

Other solutions

Summary

Ufff… all done :)

What could we learn from the benchmarks?

  • We can save some space by going into bit mode and at the same time the performance of packing might be faster than ‘no packing’ version.
  • The Standard Library containers like vector of bools or bitset doesn’t perform well, it’s better to create manual versions, adjusted to a particular need.
  • Using compiler to to the hard work: in our case it’s bit setting is quite a good alternative.
  • If the task is highly parallel make sure you use all options to make things faster: reduce dependency of variables (also temp vars), use simd if possible, or threading libraries.
  • As always measure measure measure as your case might be different.

I hope you enjoyed those tests. The problem was simple, but there are many ways we can explore the topic. And that’s only the tip of an iceberg when it comes to packing/compressing bitmaps.

Code on github: fenbf/celeroTest/celeroCompressBools.cpp

Curious case of branch performance

$
0
0

Branching, performance

When doing my last performance tests for bool packing, I got strange results sometimes. It appeared that one constant generated different results than the other. Why was that? Let’s have a quick look at branching performance.

The problem

Just to recall (first part, second part) I wanted to pack eight booleans (results of a condition) into one byte, 1 bit per condition result. The problem is relatively simple, but depending on the solution you might write code that’s 5x…8x times slower than the other version.

Let’s take a simple version that uses std::vector<bool>:

staticconstintThresholdValue= X;
std
::unique_ptr<int[]> inputValues =PrepareInputValues();
std
::vector<bool> outputValues;

outputValues
.resize(experimentValue);

// start timer
{
for(size_t i =0; i < experimentValue;++i)
outputValues
[i]= inputValues[i]>ThresholdValue;
}
// end timer

And see the results:

std::vector<bool> performance

The chart shows timings for 100 samples taken from running the code, vector size (experimentValue) is 1mln.

Do you know what the difference between the above results is?

It’s only X - the value of ThresholdValue!

If it’s 254 then you got the yellow performance, if it’s 127, then you got those green, blue squares. The generated code is the same, so why we see the difference? The same code can run eve 4x slower!

So maybe vector implementation is wrong?

Let’s use a (not optimal) manual version:

uint8_tOutByte=0;
int shiftCounter =0;

for(int i =0; i < experimentValue;++i)
{
if(*pInputData >Threshold)
OutByte|=(1<< shiftCounter);

pInputData
++;
shiftCounter
++;

if(shiftCounter >7)
{
*pOutputByte++=OutByte;
OutByte=0;
shiftCounter
=0;
}
}

And the results:

Again, when running with Threshold=127, you get the top output, while Threshold=254 returns the bottom one.

OK, but also some of the versions of the algorithm didn’t expose this problem.

For example, the optimized version. That packed 8 values at ‘once’.

uint8_tBits[8]={0};
constint64_t lenDivBy8 =(experimentValue /8)*8;

for(int64_t j =0; j < lenDivBy8; j +=8)
{
Bits[0]= pInputData[0]>Threshold?0x01:0;
Bits[1]= pInputData[1]>Threshold?0x02:0;
Bits[2]= pInputData[2]>Threshold?0x04:0;
Bits[3]= pInputData[3]>Threshold?0x08:0;
Bits[4]= pInputData[4]>Threshold?0x10:0;
Bits[5]= pInputData[5]>Threshold?0x20:0;
Bits[6]= pInputData[6]>Threshold?0x40:0;
Bits[7]= pInputData[7]>Threshold?0x80:0;

*pOutputByte++=Bits[0]|Bits[1]|Bits[2]|Bits[3]|
Bits[4]|Bits[5]|Bits[6]|Bits[7];
pInputData
+=8;
}

The samples are not lining up perfectly, and there are some outliers, but still, the two runs are very similar.

And also the baseline (no packing at all, just saving into bool array)

std::unique_ptr<uint8_t[]> outputValues(newuint8_t[experimentValue]);

// start timer
{
for(size_t i =0; i < experimentValue;++i)
outputValues
[i]= inputValues[i]>ThresholdValue;
});
// end timer

This time, Threshold=254 is slower… but still not that much, only few percents. Not 3x…4x as with the first two cases.

What’s the reason for those results?

The test data

So far I didn’t explain how my input data are even generated. Let’s unveil that.

The input values simulate greyscale values, and they are ranging from 0 up to 255. The threshold is also in the same range.

The data is generated randomly:

std::mt19937 gen(0);
std
::uniform_int_distribution<> dist(0,255);

for(size_t i =0; i < experimentValue;++i)
inputValues
[i]= dist(gen);

Branching

As you might already discover, the problem lies in the branching (mis)predictions. When the Threshold value is large, there’s little chance input values will generate TRUE. While for Threshold = 127 we get 50% chances (still it’s a random pattern).

Here’s a great experiment that shows some problems with branching: Fast and slow if-statements: branch prediction in modern processors @igoro.com. And also Branch predictor - Wikipedia.

Plus read more in The Software Optimization Cookbook: High Performance Recipes for IA-32 Platforms, 2nd Edition

For a large threshold value, most of my code falls into FALSE cases, and thus no additional instructions are executed. CPU sees this in it’s branch history and can predict the next operations. When we have random 50% pattern, the CPU cannot choose the road effectively, so there are many mispredictions.

Unfortunately, I don’t have tools to measure those exact numbers, but for me, it’s a rather clear situation. Maybe you can measure the data? Let me know!

But why the other code - the optimized version didn’t show the effect? Why it runs similarly, no matter what the constant is?

Details

Let’s look at the generated assembly: play @ godbolt.org.

Optimized version (From MSVC)

$LL4@Foo:
cmp DWORD PTR
[ecx-8],128;00000080H
lea edi
, DWORD PTR [edi+1]
lea ecx
, DWORD PTR [ecx+32]
setg BYTE PTR
_Bits$2$[esp+8]
cmp DWORD PTR
[ecx-36],128;00000080H
setle al
dec al
and al,2
cmp DWORD PTR
[ecx-32],128;00000080H
mov BYTE PTR
_Bits$1$[esp+8], al
setle bh
dec bh
and bh,4
cmp DWORD PTR
[ecx-28],128;00000080H
setle dh
dec dh
and dh,8
cmp DWORD PTR
[ecx-24],128;00000080H
setle ah
dec ah
and ah,16;00000010H
cmp DWORD PTR
[ecx-20],128;00000080H
setle bl
dec bl
and bl,32;00000020H
cmp DWORD PTR
[ecx-16],128;00000080H
setle al
dec al
and al,64;00000040H
cmp DWORD PTR
[ecx-12],128;00000080H
setle dl
dec dl
and dl,128;00000080H
or dl, al
or dl, bl
or dl, ah
or dl, dh
or dl, bh
or dl, BYTE PTR _Bits$2$[esp+8]
or dl, BYTE PTR _Bits$1$[esp+8]
mov BYTE PTR
[edi-1], dl
sub esi,1
jne $LL4@Foo
pop esi
pop ebx

And for first manual version: https://godbolt.org/g/csLeHe

        mov      edi, DWORD PTR _len$[esp+4]
test edi
, edi
jle SHORT $LN3@Foo
$LL4@Foo
:
cmp DWORD PTR
[edx],128;00000080H
jle SHORT $LN5@Foo
movzx ecx
, cl
bts ecx
, eax
$LN5@Foo
:
inc eax
add edx
,4
cmp eax
,7
jle SHORT $LN2@Foo
mov BYTE PTR
[esi], cl
inc esi
xor cl
, cl
xor eax
, eax
$LN2@Foo
:
sub edi,1
jne SHORT $LL4@Foo
$LN3@Foo
:
pop edi
pop esi
ret
0

As we can see the optimized version doesn’t use branching. It uses setCC instruction, but this is not a real branch. Strangely GCC doesn’t use this approach and uses branches so that the code could be possibly slower.

SETcc– sets the destination register to 0 if the condition is not met and to 1 if the condition is met.

See Branch and Loop Reorganization to Prevent Mispredicts | Intel® Software

Great book about perf:Branch and Loop Reorganization to Prevent Mispredicts | Intel® Software

See also this explanation for avoiding branches: x86 Disassembly/Branches wikibooks

So, if I am correct, this is why the optimized version doesn’t show any effects of branch misprediction.

The first, non-optimal version of the code contains two jumps in the loop, so that’s why we can experience the drop in performance.

Still, bear in mind that conditional moves are not always better than branches. For example read more details at Krister Walfridsson’s blog: like The cost of conditional moves and branches.

Summary

Things to remember:

  • Doing performance benchmarks is a really delicate thing.
  • Look not only at the code but also on the test data used - as different distribution might give completely different results.
  • Eliminate branches as it might give a huge performance boost!

Charts made with Nonius library, see more about in my micro-benchmarking library blog post.

A question to you:

  • How do you reduce branches in your perf critical code?

Please stop with performance optimizations!

$
0
0

Stop with performance optimization

As you might notice from reading this blog, I love doing performance optimizations. Let’s take some algorithm or some part of the app, understand it and then improve, so it works 5x… or 100x faster! Doesn’t that sound awesome?

I hope that you answered “Yes” to the question in the introduction. Doing optimizations is cool, fun… and it’s like a game: how far you can go, how much you can beat?

On the other hand, not everything can be funny and easy. Sometimes we must stop and don’t optimize more.

Let’s have a look…

Are you really optimizing?

There are lots of optimizations manuals that will give you tips and guides on how to write faster code. It’s relatively easy just to pick up some code and start applying the tricks.

You see a branch? Ok, here are some tips to reduce branching.

Is the object too large? OK, let’s see how to squeeze things a bit.

Too many allocations? OK, let’s use some memory pool.

I am not saying the tips are bad, far from that. But sometimes that will only give you a few percent of improvement.

A real optimization is often much harder than applying five random tricks.

First of all, you should understand the system/module/algorithm. Maybe you can eliminate some code completely? Maybe you can use some better algorithm with optimal complexity? Or maybe you can do things in some other way?

Ideally, you should start from the top: understand the system and then go down doing optimizations on layers. It would be bad to see that you spend a week optimizing code in the lowest layer, but then someone noticed that the half of the module could be removed entirely (with your changes of course).

Is it the right place?

Are you sure the part of the system really makes things faster?

If you optimize a routine from 1 sec to 0.1 sec that’s 10x improvement. Great!

But, if the whole system takes 100 sec and the routine is called only once, you only improved a part that is responsible for 1% of the work… was it worth doing?

To optimize things correctly, you should find hot spots in the app. Measure first, see how the system performs and the pick the real problems.

Do you try to measure the system, or just use your intuition?

Do the real task

Optimizing code is a funny game, but the job needs to be done. Not everything should run as fast as possible. A feature should work. It’s better to have two features than one-half finished feature but (potentially) working super fast. Who cares…

Rendering engines need to be fast; it’s their nature. But what about simple GUI actions?

In theory, there should be a plan for optimizations, and it should be written in the spec. If the feature is critical/important, then the spec should mention that you should optimize up to some desired level.

Do you plan the optimization in your projects?

It’s a delicate thing

Doing right benchmarks, finding hotspots, improving the code might be really tough. So many factors can influence the results. Often, you can look at the wrong data and be mislead. Some tricks will work in your case, but other might even degrade the perf. Also, if you go down to Cpu instructions level optimizations be prepared to do a lot of testing - because other platforms might show different results.

So many times my performance tests showed different results than I expected. One time I thought that I’m simply using data that is causes instruction dependency, while the slowdown came more from the branching. In real apps the problems might be even harder to measure. You think that one system is causing the trouble, while it’s because of hidden effects on a different ‘side’ of the app.


Do you like perf optimization topics? Signup to my newsletter for more.


Root of Evil

Optimized code might is also perceived as much complex. With all of the crazy asm instructions, SIMD, code duplication, loop unrolling and that kind of creative stuff. Still, I believe that fast code can also be a clean code - for example by code simplification, code removal and using optimal algorithms. The parts that really needs the special tricks might be extra commented so at least people can understand what’s going on.

You might also avoid premature optimization and read more here: StackExchange: Is premature optimization really the root of all evil?

Premature optimization is the root of all evil – DonaldKnuth

There’s also a nice post from Arne Mertz on Simple and Clean Code vs. Performance.

Wrap up

The performance game is fun. So many things you can learn, experiment and be happy that you’ve beaten the CPU. Still, it’s good to remember to stop at some point. To have work done it’s better to leave some cool algorithm in a ‘good enough’ state, and move to other tasks. Or even, you have to stop because there’s no sense to put more effort in a particular area.

However that sounds, when playing the optimization game don’t forget about the funny/creative part. Just recall from time to time, that if you have more understanding of the whole system, you can beat the CPU even more.

What are your thoughts on doing optimizations? Do you apply random tricks or have some plan? Do you have some ‘policy’ in the company regarding optimizations? Do you have performance tests for your apps?


Enhancing Visual Studio with Visual Assist

$
0
0

Visual Studio andVisual Assist

How does your typical coding session in Visual Studio look like?
What’s the first thing you do when you’re about to start coding?

Yes… let’s check Gmail, Youtube, Reddit, etc… :)

OK, please be more professional!

So, let’s assume my Visual Studio (2013, 2015 or 2017) is already started. What to do next?

A quick session

Once I’ve finally checked all the Internet status of the World, I can finally switch to Visual Studio and just look at the code. Today I am about to write a new feature to my FileTest project. There’s one interface called IFileTransformer, and I’d like to add a new derived class.

BTW: FileTest project is a real thing, take a look here - on my github page if you like.

Visual Studio opened not the file where the interface is located, so what to do? Let’s search the symbol.

Shift + Alt + S and Visual Assist opens a nice window (Find Symbol):

I can immediately go to the location of the interface now and recall what it does. This search symbol window is one of my essential tools; I can search almost everything with it. Of course, it works even if I remember just a part of the name.

There are some existing implementations of IFileTransformer, so it’s better to check what are they. I can, of course, skim the file (fortunately all the code is in one header, but in a larger project derived classes are usually spread over multiple files). With VA you can just click on the type and use a powerful feature called Go To Related, which shows all the derived classes for our interface.

Let’s now go to to the last derived class - MappedWinFileTransformer. We can simplify our life for now and just copy and paste it as a new implementation. Let’s call it SuperFileTransformer.

classSupaFileTransformer:publicIFileTransformer
{
public:
usingIFileTransformer::IFileTransformer;

virtualboolProcess(TProcessFunc func)override;
};

Great… I have the class declaration in place, but I have to implement the main method Process. It’s another case where VA can help:

I can select the method and use VA to create an empty implementation. Most of the time it also places the code in the correct source file, at the end or near other methods of the same class.

But wait… haven’t I done a typo!? I wanted to write SuperFileTransformer, but I got SupaFileTransformer. So let’s rename it! In the previous screenshot you can see that there’s also an option to Rename a symbol. So I use that, and immediately my new interface has a correct name - and it’s changed across all the project!

You might notice that Visual Studio (even 2015) also contains that option, but VA offers it for much longer time, and it’s much reliable and faster as I’ve noticed.

Now, I don’t remember what’s the correct steps to implement the final file transformer, so for now, I just want to test if my simple code correctly works in the framework.

I can write a simple code like:

boolSuperFileTransformer::Process(TProcessFunc func)
{
constauto strInput = m_strFirstFile;
constauto strOutput = m_strSecondFile;

Logger::PrintTransformSummary(/*blocks */10,/*blockSize*/100,
strInput
, strOutput);

returntrue;
}

It should only write a success log message.

Now I need to check and debug if everything works OK. F5 and then I’ll try to set a breakpoint in my new code. One of the things that might quite often interrupt the debugger session is stepping into an unwanted function. There are ways to block that and for example, don’t enter std::string constructor… but with Visual Assist it’s super easy.

Visual Assist Step Filter

While debugging just use Step Filter. You can select which methods will be blocked.

For example, when I break at the code:

debugger, break at

And I want to step into PrintTransformSummary I’ll, unfortunately, go into some strange code for std::string. But then, VA will show something like this on the Step Filter Window:

Visual Assist Step Filter

I can click on the marked debug event, and next time I won’t go into std::string constructor. This option can be saved per project or globally.

My code seems to work fine; the new interface was registered, it’s invoked correctly.

I’ll add a reminder to implement the read functionality:

// #todo: add final implementation
// #transformer: new, not finished implementation

Visual Assist has a great feature, added not that long ago, that transforms those hashtags into something searchable. Take a look:

Visual Assist Hash Tags

In the Hashtags window, I can browse through my tags, see their location, read code around it. It’s great to leave notes, use it as bookmarks.

OK, I can now finish my little session and go back and check if the status of the World changed and open Reddit/Twitter/Youtube again!

A bit of intro

The story was a simple coding session. Let’s review all of the tools that helped me with code.

First of all the whole Visual Assist (VA):

It’s been several years since I started using Visual Assist. Every time I open Visual Studio without this powerful extension I just say “meh… “. Microsoft IDE is getting better and better, but still, there are areas where they cannot compete with VA.

Here’s a nice list of all the features compared to Visual Studio.

Compare Visual Assist to Microsoft Visual Studio

and here are the 10 Top Features

My main tool belt

Here are the tools that I am using all the time:

Find Symbol

Documentation: Find Symbol in Solution

It’s probably the most often feature I use. It allows me to find anything in the project.

I can search for all symbols, or just classes/structures. It offers filters, negative searches (with a hyphen), sorting. Not to mention it is super fast.

Go To Related/Go To Implementation

Documentation: GoTo Related, GoTo Implementation

It’s important when I want to switch between definition and declaration, see other implementations of some interface or a virtual method. What’s more, it also works from hashtags and comments.

Find All References

Documentation: Find References

Another core feature, when you want to search where a particular symbol is used. Visual Assist offers a better view of the results, take a look:

It’s grouped by file, with an option to show the code when you hover over a particular element.

VA Hashtags

Documentation: VA Hashtags

As I mentioned before, it allows you to have a little task tracker combined with bookmarks. Jump quickly between code sections, related modules, features. And everything is stored inside the same source file, no need to use external files, settings. It’s generated while editing.

Debug Step Filter

Documentation: VA Step Filter

Easily disable code that should be skipped while debugging. Simplifies debugging of code that touches some third party code, standard libraries, etc. Assuming that you don’t have to debug that far and can just focus on your code.

Refactoring/Code Generation:

Documentation: Refactoring, Code Generation

There are so many features here:

  • rename
  • change signature
  • encapsulate field
  • extract method
  • introduce variable
  • move implementation between files/headers
  • move section to a new file
  • create implementation
  • create from usage
  • create method Implementations

Syntax colouring

Documentation: Enhanced Syntax Coloring,

Just a taste of the syntax colouring.

Some little improvements

  • Spellcheck - you can now write comments using proper words :)
  • Snippets - I use it especially for comments for function/classes documentations, surround with a header guard, etc.

Other, there is even more cool stuff from Visual Assist, but I’m still improving my workflow and learning new things!

BTW: A few years ago I also wrote about hot VA can help in code understanding, take a look here: 3 Tools to Understand New Code from Visual Assist

You can see more on Whole Tomato Youtube Channel

Wrap up

Visual Assist improves my speed of coding, moving around the project files, understand what’s going on. I wanted to share my basic setup and core tools that I am using.

If you like Visual Assist or got a bit interested, just download the trial and play with it. I highly recommend this tool!

After the trial period it’s quite hard to live without it… believe me :)

C++17 in details: fixes and deprecation

$
0
0

C++17 features

The new C++ Standard - C++17 - is near the end to be accepted and published. There’s already a working draft, and not that long ago it went to the final ISO balloting. It’s a good occasion to learn and understand what are the new features.

Let’s start slowly, and today we’ll look at language/library fixes and removed elements.

Intro & Series

This is the first post from my new series about C++17 details. I’ve already shared a lot of stuff, especially in my huge C++17 collaborative post from the beginning of the year. Still, it’s good to look at things in a bit more details.

The plan for the series

  1. Fixes and deprecation (today)
  2. Language clarification (soon)
  3. Templates (soon + 1)
  4. Attributes (soon + 2)
  5. Simplification (soon + 3)
  6. Library changes 1 (soon + 4)
  7. Library changes 2 (soon + 5)

First of all, if you want to dig into the standard on your own, you can read the latest draft here:

N4659, 2017-03-21, Working Draft, Standard for Programming Language C++ - the link also appears on the isocpp.org.

Compiler support: C++ compiler support

In Visual Studio (since VS 2015 Update 3) you can try using Standard Version Switches and test your code conformance with the given standard: Standards version switches in the compiler.

Moreover, I’ve prepared a list of concise descriptions of all of the C++17 language features:

It’s a one-page reference card, PDF.

Removed things

The draft for the language contains now over 1586 pages! Due to compatibility requirements, the new features are added, but not much is removed. Fortunately, there are some things that could go away.

Removing trigraphs

Trigraphs are special character sequences that could be used when a system doesn’t support 7-bit ASCII - like in ISO 646 character set . For example ??= generated #, ??- produces ~. BTW: All of C++’s basic source character set fits in 7-bit ASCII. The sequences are rarely used and by removing them the translation phase of the code might be simpler.

If you want to know more: c++03 - Purpose of Trigraph sequences in C++? - Stack Overflow, or Digraphs and trigraphs - Wikipedia.

More details in: N4086. If you really need trigraphs with Visual Studio, take a look at /Zc:trigraphs switch. Also, other compilers might leave the support in some way or the other. Other compiler status: done in GCC: 5.1 and Clang: 3.5.

Removing register keyword

The register keyword was deprecated in the 2011 C++ standard as it has no meaning. Now it’s being removed. This keyword is reserved and might be repurposed in the future revisions (for example auto keyword was reused and now is something powerful).

More details: P0001R1, MSVC 2017: not yet. Done in GCC: 7.0 and Clang: 3.8.

Remove Deprecated operator++(bool)

This operator is deprecated for a very long time! In C++98 is was decided that it’s better not to use it. But only in C++17, the committee agreed to remove it from the language.

More details: P0002R1, MSVC 2017: not yet. Done in GCC: 7.0 and Clang: 3.8.

Removing Deprecated Exception Specifications from C++17

In C++17 exception specification will be part of the type system (see P0012R1). Still the standard contains old and deprecated exception specification that appeared to be not practical and not used.

For example:

void fooThrowsInt(int a)throw(int){
printf_s
("can throw ints\n");
if(a ==0)
throw1;
}

The above code is deprecated since C++11. The only practical exception declaration is throw() that mean - this code won’t throw anything. But since C++11 it’s advised to use noexcept.

For example in clang 4.0 you’ll get the following error:

error: ISO C++1z does not allow dynamic exception specifications [-Wdynamic-exception-spec]
note: use 'noexcept(false)' instead

More details: P0003R5, MSVC 2017: not yet. Done in GCC: 7.0 and Clang: 4.0.

Removing auto_ptr

This is one of my favorite update to the language!

In C++11 we got smart pointers: unique_ptr, shared_ptr and weak_ptr. Thanks to the move semantics the language could finally support proper unique resource transfers. auto_ptr was old and buggy thing in the language - see the full reasons here - why is auto_ptr deprecated. It should be almost automatically converted to unique_ptr. For some time auto_ptr was deprecated (since C++11). Many compilers would report this like:

warning: 'template<class> class std::auto_ptr' is deprecated

Now it goes into a zombie state, and basically, your code won’t compile.

Here’s the error from: MSVC 2017 when using /std:c++latest:

error C2039: 'auto_ptr': is not a member of 'std'

If you need help with the conversion from auto_ptr to unique_ptr you can check Clang Tidy, as it provides auto conversion: Clang Tidy: modernize-replace-auto-ptr.

More details: N4190

In the linked paper N4190: there are also other library items that were removed: unary_function/binary_function, ptr_fun(), and mem_fun()/mem_fun_ref(), bind1st()/bind2nd() and random_shuffle.

Fixes

We can argue what is a fix in a language standard and what is not. Below I’ve picked three things that sound to me like a fix for something that was missed in the previous standards.

New auto rules for direct-list-initialization

Since C++11 we got a strange problem where:

auto x {1};

Is deduced as initializer_list. With the new standard, we can fix this, so it will deduce int (as most people would initially guess).

To make this happen, we need to understand two ways of initialization: copy and direct.

auto x = foo();// copy-initialization
auto x{foo};// direct-initialization, initializes an
// initializer_list (until C++17)
int x = foo();// copy-initialization
int x{foo};// direct-initialization

For the direct initialization, C++17 introduces new rules:

For a braced-init-list with only a single element, auto
deduction will deduce from that entry;
For a braced-init-list with more than one element, auto
deduction will be ill-formed.

For example:

auto x1 ={1,2};// decltype(x1) is std::initializer_list<int>
auto x2 ={1,2.0};// error: cannot deduce element type
auto x3{1,2};// error: not a single element
auto x4 ={3};// decltype(x4) is std::initializer_list<int>
auto x5{3};// decltype(x5) is int

More details in N3922 and also in Auto and braced-init-lists, by Ville Voutilainen. Already working since MSVC 14.0, GCC: 5.0, Clang: 3.8.

static_assert with no message

Self-explanatory. It allows just to have the condition without passing the message, the version with the message will also be available. It will be compatible with other asserts like BOOST_STATIC_ASSERT (that didn’t take any message from the start).

static_assert(std::is_arithmetic_v<T>,"T must be arithmetic");
static_assert(std::is_arithmetic_v<T>);// no message needed since C++17

More details: N3928, supported in MSVC 2017, GCC: 6.0 and Clang: 2.5.

Different begin and end types in range-based for

Since C++11 range-based for loop was defined internally as:

{
auto&& __range =for-range-initializer;
for(auto __begin = begin-expr,
__end
= end-expr;
__begin
!= __end;
++__begin ){
for-range-declaration =*__begin;
statement
}
}

As you can see, __begin and __end have the same type. That might cause some troubles - for example when you have something like a sentinel that is of a different type.

In C++17 it’s changed into:

{
auto&& __range =for-range-initializer;
auto __begin = begin-expr;
auto __end = end-expr;
for(; __begin != __end;++__begin ){
for-range-declaration =*__begin;
statement
}
}

Types of __begin and __end might be different; only the comparison operator is required. This little change allows Range TS users a better experience.

More details in P0184R0, supported in MSVC 2017, GCC: 6.0 and Clang: 3.6.

Summary

The language standard grows, but there’s some movement in the committee to remove and clean some of the features. For compatibility reasons, we cannot delete all of the problems, but one by one we can get some improvements.

Next time we’ll address language clarifications: like guaranteed copy elision or expression evaluation order. So stay tuned!

Once again, remember to grab my C++17 Language Ref Card .

And BTW: you can read about modern C++ (including C++17), in a recent book from Marius Bancila: Modern C++ Programming Cookbook

C++17 in details: language clarfications

$
0
0

C++17 features, clarifications

The second part of my series about C++17 details. Today I’d like to focus on features that clarify some tricky parts of the language. For example copy elision and expression evaluation order.

Intro

You all know this… C++ is a very complex language, and some (or most? :)) parts are quite confusing. One of the reasons for the lack of clarity might be a free choice for the implementation/compiler - for example to allow for more aggressive optimizations or be backward (or C) compatible. Sometimes, it’s simply a lack of time/effort/cooperation. C++17 reviews some of most popular ‘holes’ and addressed them. In the end, we get a bit clearer way of how things might work.

Today I’d like to mention about:

  • Evaluation order
  • Copy elision (optional optimization that seems to be implemented across all of the popular compilers)
  • Exceptions
  • Memory allocations for (over)aligned data

The Series

This post is a second in the series about C++17 features details.

The plan for the series

  1. Fixes and deprecation
  2. Language clarification (today)
  3. Templates (soon)
  4. Attributes (soon + 1)
  5. Simplification (soon + 2)
  6. Library changes 1 (soon + 3)
  7. Library changes 2 (soon + 4)

Just to recall:

First of all, if you want to dig into the standard on your own, you can read the latest draft here:

N4659, 2017-03-21, Working Draft, Standard for Programming Language C++ - the link also appears on the isocpp.org.

Compiler support: C++ compiler support

Moreover, I’ve prepared a list of concise descriptions of all of the C++17 language features:

It’s a one-page reference card, PDF.

There’s also a talk from Bryce Lelbach: C++Now 2017: C++17 Features

Stricter expression evaluation order

This one is tough, so please correct me if I am wrong here, and let me know if you have more examples and better explanations. I’ve tried to confirm some details on slack/Twitter, and hopefully I am not writing nonsenses here :)

Let’s try:

C++ doesn’t specify any evaluation order for function parameters. Dot.

For example, that’s why make_unique is not just a syntactic sugar, but actually it guarantees memory safety:

With make_unique:

foo(make_unique<T>(), otherFunction());

And with explicit new.

foo(unique_ptr<T>(new T), otherFunction());

It’s possible that new T will happen before the constructor of unique_ptr. When otherFunction is invoked, and for example, it throws, then new T generates a leak (as the unique pointer is not yet created). When you use make_unique, then it’s not possible to leak, even when the order of execution is random. More of such problems in GotW #56: Exception-Safe Function Calls

With the accepted proposal the order of evaluation should be ‘practical.’

Examples:

  • in f(a, b, c) - the order of evaluation of a, b, c is still unspecified, but any parameter is fully evaluated before the next one is started. Especially important for complex expressions.
    • if I’m correct that fixes a problem with make_unique vs unique_ptr<T>(new T()). As function argument must be fully evaluated before other arguments are.
  • chaining of functions already work from left to right, but the order of evaluation of inner expressions might be different. look here: c++11 - Does this code from “The C++ Programming Language” 4th edition section 36.3.6 have well-defined behavior? - Stack Overflow. To be correct “The expressions are indeterminately sequenced with respect to each other”, see Sequence Point ambiguity, undefined behavior?.
  • now, with C++17, chaining of functions will work as expected when they contain such inner expressions, i.e., they are evaluated from left to right: a(expA).b(expB).c(expC) is evaluated from left to right and expA is evaluated before calling b…
  • when using operator overloading order of evaluation is determined by the order associated with the corresponding built-in operator:
    • so std::cout << a() << b() << c() is evaluated as a, b, c.

And from the paper:

the following expressions are evaluated in the order a, then b, then c:
1. a.b
2. a->b
3. a->*b
4. a(b1, b2, b3)
5. b @= a
6. a[b]
7. a << b
8. a >> b

And the most important part of the spec is probably:

The initialization of a parameter, including every associated value computation and side effect, is indeterminately sequenced with respect to that of any other parameter.

StackOverflow: What are the evaluation order guarantees introduced. by C++17?

More details in: P0145R3 and P0400R0. Not yet supported in Visual Studio 2017, GCC 7.0, Clang 4.0

Guaranteed copy elision

Currently, the standard allows eliding in the cases like:

  • when a temporary object is used to initialize another object (including the object returned by a function, or the exception object created by a throw-expression)
  • when a variable that is about to go out of scope is returned or thrown
  • when an exception is caught by value

But it’s up to the compiler/implementation to elide or not. In practice, all the constructors’ definitions are required. Sometimes elision might happen only in release builds (optimized), while Debug builds (without any optimization) won’t elide anything.

With C++17 we’ll get clear rules when elision happens, and thus constructors might be entirely omitted.

Why might it be useful?

  • allow returning objects that are not movable/copyable - because we could now skip copy/move constructors. Useful in factories.
  • improve code portability, support ‘return by value’ pattern rather than use ‘output params.’

Example:

// based on P0135R0
structNonMoveable
{
NonMoveable(int);
// no copy or move constructor:
NonMoveable(NonMoveable&)=delete;
voidNonMoveable(NonMoveable&)=delete;

std
::array<int,1024> arr;
};

NonMoveable make()
{
returnNonMoveable(42);
}

// construct the object:
auto largeNonMovableObj = make();

The above code wouldn’t compile under C++14 as it lacks copy and move constructors. But with C++17 the constructors are not required - because the object largeNonMovableObj will be constructed in place.

Defining rules for copy elision is not easy, but the authors of the proposal suggested new, simplified types of value categories:

  • glvalue - ‘A glvalue is an expression whose evaluation computes the location of an object, bit-field, or function. ‘
  • prvalue - A prvalue is an expression whose evaluation initializes an object, bit-field, or operand of an operator, as specified by the context in which it appears

In short: prvalues perform initialization, glvalues produce locations.

Unfortunately, in C++17 we’ll get copy elision only for temporary objects, not for Named RVO (so it covers only the first point, not for Named Return Value Optimization). Maybe C++20 will follow and add more rules here?

More details: P0135R0, MSVC 2017: not yet. GCC: 7.0, Clang: 4.0.

Exception specifications part of the type system

Previously exception specifications for a function didn’t belong to the type of the function, but now it will be part of it.

We’ll get an error in the case:

void(*p)();
void(**pp)() noexcept =&p;// error: cannot convert to
// pointer to noexcept function

struct S {typedefvoid(*p)();operator p();};
void(*q)() noexcept = S();// error: cannot convert to
// pointer to noexcept

One of the reasons for adding the feature is a possibility to allow for better optimization. That can happen when you have a guarantee that a function is for example noexcept.

Also in C++17 Exception specification is cleaned up: Removing Deprecated Exception Specifications from C++17 - it’s so-called ‘dynamic exception specifications’. Effectively, you can only use noexcept specifier for declaring that a function might throw something or not.

More details: P0012R1, MSVC 2017: not yet, GCC 7.0, Clang 4.0.

Dynamic memory allocation for over-aligned data

When doing SIMD or when you have some other memory layout requirements, you might need to align objects specifically. For example, in SSE you need a 16-byte alignment (for AVX 256 you need a 32-byte alignment). So you would define a vector4 like:

class alignas(16) vec4 
{
float x, y, z, w;
};
auto pVectors =new vec4[1000];

Note: alignas specifier is available sice C++11.

In C++11/14 you have no guarantee how the memory will be aligned. So often you have to use some special routines like _aligned_malloc/_aligned_free to be sure the alignment is preserved. That’s not nice as it’s not working with C++ smart pointers and also make memory allocations/deletions visible in the code (we should stop using raw new and delete, according to Core Guidelines).

C++17 fixes that hole by introducing additional memory allocation functions that use align parameter:

void*operatornew(size_t,align_val_t);
void*operatornew[](size_t,align_val_t);
voidoperatordelete(void*,align_val_t);
voidoperatordelete[](void*,align_val_t);
voidoperatordelete(void*,size_t,align_val_t);
voidoperatordelete[](void*,size_t,align_val_t);

now, you can allocate that vec4 array as:

auto pVectors =new vec4[1000];

No code changes, but it will magically call:

operatornew[](sizeof(vec4),align_val_t(alignof(vec4)))

In other words, new is now aware of the alignment of the object.

More details in P0035R4. MSVC 2017: not yet, GCC: 7.0, Clang: 4.0.

Summary

Today we’ve focused on four areas where C++ specification is now clearer. We have now ways to assume Copy Ellison will happen, some orders of operations are well defined now, operator new is now aware of the alignment of a type and also exceptions are part of the function declaration.

What are your picks for language clarification?

What are other ‘holes’ needed to be filled?

Next time we’ll address changes for templates and generic programming. So stay tuned!

Once again, remember to grab my C++17 Language Ref Card.

Modern C++ Programming Cookbook Review

$
0
0

Modern C++ Programming

Since May 2017 we got one more book about Modern C++! A Few weeks ago I got a copy from Packt Publishing, and today I’d like to write a few words about the book. In short: it’s a very good book! :)

But let’s see what’s inside…

Important: I have three ebooks for you, read more at the end of the post!

The Book

Modern C++ Programming Cookbook
by Marius Bancila

About Marius: his blog, @mariusbancila

His post about the book being published

source code available at PackPub site

The Structure

The Book is intended for all C++ developers, regardless of their experience. The beginner and intermediate developers will benefit the most from the book in their attempt to become prolific with C++.
Experienced C++ developers, on the other hand, will find a good reference for many C++11, C++14, and C++17 language and library features that may come in handy from time to time.

There are 11 chapters, around 550 pages, over 100 recipes!

1. Learning Modern Core Language Features

Using auto, type aliases, uniform initialization, scoped enums, and even structured bindings (C++17)

2. Working with Numbers and Strings

Performing conversions, handling numeric types, user defined literals, string_view (C++17)

3. Exploring Functions

Deleted functions, lambdas, map and folds, higher order functions, functional programming

4. Preprocessor and Compilation

Conditional compilation, preprocessor hacks, enable_if (SFINAE), constexpr if (C++17), attributes.

5. Standard Library Containers, Algorithms, and Iterators

Using vector, bitset, algorithms, searching, writing a custom iterator

6. General Purpose Utilities

Time intervals, measuring time, hashing, std::any, std::optional, std::variant (all from C++17), visitors, type traits.

7. Working with Files and Streams

Reading and writing from/to files, serialization of objects, filesystem (C++17)

8. Leveraging Threading and Concurrency

Threads, locking, async invocation, implementing parallel map and fold, tasks, atomics.

9. Robustness and Performance

Exceptions, noexcept, constant expressions, smart pointers, move semantics.

10. Implementing Patterns and Idioms

improving factory patters (by avoiding if…else statements), pimpl idiom, named parameter idiom, NVI, attorney-client idiom, thread-safe singleton.

11. Exploring Testing Frameworks

Writing tests in Boost.Test, Google Test, Catch

My View

As you can see with the book, we get a lot of useful recipes. What I like in the first place, is that there are topics from C++11, C++14, and even C++17. Thus the book is up to date (even further than the current C++ status!). The author explains clearly what changed between C++ versions. I know how hard it is to pick all of those little nuances in the standard versions, so it’s a solid advantage of the book.

A few of recipes that took my attention:

  • Enabling range-based for loops for custom types - very handy if you don’t work only with standard library containers.
  • Creating cooked user-defined literals
  • Using string_view instead of constant string references
  • Using fold expressions to simplify variadic function templates.”
  • Chapters about functional programming
  • Providing metadata to the compiler with attributes
  • Serialization
  • C++17 library features: any, variant, optional and also filesystem.
  • Patterns like attorney-client idiom.
  • Chrono

For example, with chrono I’ve found some beautiful code, take a look:

usingnamespace std::chrono_literals;
auto d1 =1h+23min+45s;// d1 = 5025s
auto d2 =3h+12min+50s;// d2 = 11570s
if(d1 < d2){/* do something */}

Isn’t this clean and expressive? Code possible thanks to chrono_literals (available since C++14) and User Defined Literals (C++11).

The cookbook style is well suited for ‘modern’ learning when you want to quickly pick a topic and read - without the need to read from the beginning to the end. Here, depending on your knowledge level and experience you might want to read the whole book, or just choose several recipes. I like such approach. I am a fan of cookbooks as I’ve reviewed some of them previously (like here,here and here)

With this book, we get a lof of ‘meat’ inside. There’s no theoretic/bloviated chapters, waffling or something like that… you get an actionable recipe that you can use in your code, experiment. Of course, recipes are often connected - especially in one chapter they go from the most basic up to advanced areas.

I believe it was also quite hard to decide what to include in the book. In theory having ‘all’ recipes for C++ would take like 2000… or 3000 pages. Still, I think the book is well organized, and you get most of the useful stuff from modern C++. Of course, I’d like to ask for more :)

The selection of topics tries to cover all of needs. If you need some specific/advanced parts then you can pick other books like Effective Modern C++, C++ Concurrency in Action, Discovering Modern C++.

Summary

Final mark: 4.5/5

Pros:

  • Clear structure
  • Cookbook style, so read what you need
  • Chapters usually starts with some basic recipes and then increase the level of complexity.
  • Concise examples and details how it works, so not just functions headers.
  • Modern coding standard, even with C++17 stuff!
  • C++11, C++14, and C++17 - with clear distinction, explanation what have changes, etc.
  • It doesn’t have much of ‘intro to C++,’ so you can just jump into intermediate topics! It’s not another basic beginner’s book.
  • There are useful ‘tips’ here and there

Cons:

  • A few typos, repetitions, one missing function description
  • Chapter about unit testing frameworks could be shorter, but maybe other devs find it useful.
  • Some recipes are questionable: but that depends on the view/experience. For example: using bitset. And I’d like to see more performance topics in the performance chapter.

Overall, I like the book. With its clear structure and well-written recipes, it’s a good addition to any C++ bookshelf. It’s well suited for the target audience: even if you’re an expert, you’ll get a chance to refresh your knowledge and update it with C++14/C++17 content. And If you’ve just finished some beginner book, you’ll find here topics that will move you forward.

I am impressed how Marius ended up with such a good book, especially that it’s his first one as I know. I think the second edition of C++ Cookbook will be just perfect :)

Giveaway

Together with Packt Publishing, we have three “Modern C++ Programming Cookbookebooks to give!

All you have to do is enter your email/name into this funny giveaway tool below and take part in the game :)

Please note that if you're already on the mailing list it doesn't mean you take part in the giveaway, you have to enter the game 'explicitly'.

Also, If you write a comment, that increases your chances to win! (Be sure to also subscribe to the game tool!)

To help you with a comment idea you can think about the following questions:

  • What’s you favourite modern C++ area?
  • Do you use modern C++, or you’re stuck with some legacy/old standard?
  • What recipes for modern C++ would you like to read more?
  • What other books you’d suggest for modern C++?
Modern C++ Programming Cookbook Giveaway

We’ll randomly draw 3 winners on July 3rd; it will be announced here in this post and on Twitter. So you have two weeks to take the action :)

C++17 in details: Templates

$
0
0

C++17 features, templates

For C++17 everyone wanted to have concepts, and as you know, we didn't get them. But does it mean C++17 doesn’t improve templates/template meta-programming? Far from that! In my opinion, we get excellent features.

Read more for details.

Intro

Do you work a lot with templates and meta-programming?
With C++17 we get a few nice improvements: some are quite small, but also there are notable features as well! All in all, the additions should significantly improve writing template code.

Today I wrote about:

  • Template argument deduction for class templates
  • template<auto>
  • Fold expressions
  • constexpr if
  • Plus some smaller, detailed improvements/fixes

BTW: if you’re really brave you can still use concepts! They are merged into GCC so you can play with them even before they are finally published.

The Series

This post is the third one in the series about C++17 features details.

The plan for the series

  1. Fixes and deprecation
  2. Language clarification
  3. Templates (today)
  4. Attributes (soon)
  5. Simplification (soon + 1)
  6. Library changes 1 (soon + 2)
  7. Library changes 2 (soon + 3)

Just to recall:

First of all, if you want to dig into the standard on your own, you can read the latest draft here:

N4659, 2017-03-21, Working Draft, Standard for Programming Language C++ - the link also appears on the isocpp.org.

WG21 P0636r0: Changes between C++14 and C++17

Compiler support: C++ compiler support

Moreover, I’ve prepared a list of concise descriptions of all of the C++17 language features:

It’s a one-page reference card, PDF.

There’s also a talk from Bryce Lelbach: C++Now 2017: C++17 Features

And have a look at my master C++17 features post: C++17 Features

Template argument deduction for class templates

I have good and bad news for you :)

Do you often use make<T> functions to construct a templated object (like std::make_pair)?
With C++17 you can forget about (most of) them and just use a regular constructor :)
That also means that a lot of your code - those make<T> functions can now be removed.

The reason?

C++17 filled a gap in the deduction rules for templates. Now the template deduction can happen for standard class templates and not just for functions.

For instance, the following code is (and was) legal:

void f(std::pair<int,char>);

// call:
f
(std::make_pair(42,'z'));

Because std::make_pair is a template function (so we can perform template deduction).

But the following wasn’t (before C++17)

void f(std::pair<int,char>);

// call:
f
(std::pair(42,'z'));

Looks the same, right? This was not OK because std::pair is a template class, and template classes could not apply type deduction in their initialization.

But now we can do that so that the above code will compile under C++17 conformant compiler.

What about creating local variables like tuples or pairs?

std::pair<int,double> p(10,0.0);
// same as
std
::pair p(10,0.0);// deduced automatically!

Try in Compiler Explorer: example, GCC 7.1.

This can substantially reduce complex constructions like

std::lock_guard<std::shared_timed_mutex,
std
::shared_lock<std::shared_timed_mutex>> lck(mut_, r1);

Can now become:

std::lock_guard lck(mut_, r1);

Note, that partial deduction cannot happen, you have to specify all the template parameters or none:

std::tuple t(1,2,3);// OK: deduction
std
::tuple<int,int,int> t(1,2,3);// OK: all arguments are provided
std
::tuple<int> t(1,2,3);// Error: partial deduction

Also if you’re adventurous you can create your custom class template deduction guides: see here for more information: recent post: Arne Mertz: Modern C++ Features - Class Template Argument Deduction.

BTW: why not all make functions can be removed? For example, consider make_unique or make_shared are they only for ‘syntactic sugar’? Or they have other important uses? I’ll leave this as an exercise :)

More details in

MSVC not yet, GCC: 7.0, Clang: not yet.

Declaring non-type template parameters with auto

This is another part of the strategy to use auto everywhere. With C++11 and C++14 you can use it to automatically deduce variables or even return types, plus there are also generic lambdas. Now you can also use it for deducing non-type template parameters.

For example:

template<auto value>void f(){}

f
<10>();// deduces int

This is useful, as you don’t have to have a separate parameter for the type of non-type parameter. Like:

template<typenameType,Type value>constexprTypeTConstant= value;
// ^^^^ ^^^^
constexprautoconstMySuperConst=TConstant<int,100>;

with C++17 it’s a bit simpler:

template<auto value>constexprautoTConstant= value;
// ^^^^
constexprautoconstMySuperConst=TConstant<100>;

So no need to write Type explicitly.

As one of the advanced uses a lot of papers/blogs/talks point to an example of Heterogeneous compile time list:

template<auto... vs>structHeterogenousValueList{};
usingMyList=HeterogenousValueList<'a',100,'b'>;

Before C++17 it was not possible to declare such list directly, some wrapper class would have to be provided first.

More details in

MSVC not yet, GCC: 7.0, Clang: 4.0.

Fold expressions

With C++11 we got variadic templates which is a great feature, especially if you want to work with a variable number of input parameters to a function. For example, previously (pre C++11) you had to write several different versions of a function (like one for one parameter, another for two parameters, another for three params… ).

Still, variadic templates required some additional code when you wanted to implement ‘recursive’ functions like sum, all. You had to specify rules for the recursion:

For example:

autoSumCpp11(){
return0;
}

template<typename T1,typename... T>
autoSumCpp11(T1 s, T... ts){
return s +SumCpp11(ts...);
}

And with C++17 we can write much simpler code:


template<typename...Args>auto sum(Args...args)
{
return(args +...+0);
}

// or even:

template<typename...Args>auto sum2(Args...args)
{
return(args +...);
}

Fold expressions over a parameter pack.

ExpressionExpansion
(… op pack)((pack1 op pack2) op …) op packN
(init op … op pack)(((init op pack1) op pack2) op …) op packN
(pack op …)pack1 op (… op (packN-1 op packN))
(pack op … op init)pack1 op (… op (packN-1 op (packN op init)))

Also by default we get the following values for empty parameter packs (P0036R0):

Operatordefault value
&&true
||false
,void()

Here’s quite nice implementation of a printf using folds:

template<typename...Args>
voidFoldPrint(Args&&... args){
(cout <<...<< forward<Args>(args))<<'\n';
}

Or a fold over a comma operator:

template<typename T,typename...Args>
void push_back_vec(std::vector<T>& v,Args&&... args)
{
(v.push_back(args),...);
}

In general, fold expression allows writing cleaner, shorter and probably easier to read code.

More details in:

MSVC not yet, GCC: 6.0, Clang: 3.6 (N4295)/3.9(P0036R0).

constexpr if

This is a big one!

The static-if for C++!

The feature allows you to discard branches of an if statement at compile-time based on a constant expression condition.

ifconstexpr(cond)
statement1
;// Discarded if cond is false
else
statement2
;// Discarded if cond is true

For example:

template<typename T>
auto get_value(T t){
ifconstexpr(std::is_pointer_v<T>)
return*t;
else
return t;
}

This removes a lot of the necessity for tag dispatching and SFINAE and even for #ifdefs.

I’d like to return to this feature when we are discussing features of C++17 that simplify the language. I hope to come back with more examples of constexpr if.

More details in:

MSVC not yet, GCC: 7.0, Clang: 3.9.

Other

In C++17 there are also other language features related to templates. In this post, I wanted to focus on biggest enhancements, so I’ll just mention the other briefly:

  • Allow typename in a template template parameters: N4051.

    • Allows you to use typename instead of class when declaring a template template parameter. Normal type parameters can use them interchangeably, but template template parameters were restricted to class.
  • DR: Matching of template template-arguments excludes compatible templates: P0522R0.

  • Allow constant evaluation for all non-type template arguments: N4268

    • Remove syntactic restrictions for pointers, references, and pointers to members that appear as non-type template parameters:
  • constexpr lambdas: P0170R1

    • Lambda expressions may now be constant expressions.

Summary

Is C++17 improving templates and meta-programming? Definitely!

We have really solid features like template deduction for class templates, template<auto> plus some detailed features that fix some of the problems.

Still, for me, the most powerful features, that might have a significant impact on the code is constexpr if and folds. They greatly clean up the code and make it more readable.

What are your favorite parts regarding templates?

Next time we’ll address attributes like [[fallthrough]] or [[nodiscard]], and I’d like to recall other, already existing attributes. Stay tuned!

Once again, remember to grab my C++17 Language Ref Card .

Viewing all 325 articles
Browse latest View live