Tuesday, November 13, 2012

New syntax - Range-based for

The 'range-based for' (i.e. foreach style) for loops provide C++11 with a simple for-each style loop syntax. It works with both the STL collection classes (hiding the complexity of using the STL iterator's manually), as well as with plain C arrays, and can be made to work with any custom classes as well (see Using with your own collection classes below).

As such, 'range-based for' is a feature with which C++11 reduces unnecessary verbosity while simplifying the language. It's well supported in all the main C++ compilers already - clang (3.0+above), gcc (4.6+above), MSVC (since VC11).

The following examples illustrate how it works (note that the C++11 examples also make use of new auto keyword - check-out my previous blogpost if you're not familiar with this yet):

Old C++:

for(std::vector<int>::iterator it = myVector.begin(); it != myVector.end(); ++it) { *it *= 2; }

for(std::map<int, set<std::string> >::iterator it = myMap.begin(); it != myMap.end(); ++it) { ... }

for(std::set<MyClass>::iterator it = mySet.const_begin(); it != mySet.const_end(); ++it) {
   const MyClass& value = *it;
   ...
}

int arr[] = { 1, 2, 3, 4};
for(int i=0; i < sizeof(arr) / sizeof(int); i++) {
   int& value = arr[i];
   ...
}

// iterate all elements in a multi-map for which the key matches "hello" ...
pair<std::multimap<std::string,int>::iterator, std::multimap<std::string,int>::iterator> range = 
   myMultiMap.equal_range("hello");

for(std::multimap<std::string,int>::iterator it=range.first; it!=range.second; ++it) { ... }

C++11:

for(auto& value: myVector) { value *= 2; }

for(auto it: myMap) { ... }

for(const auto& value: mySet) { ... }

int arr[] { 1, 2, 3, 4 };
for(int value: arr) {
   ...
}

// iterate all elements in a multi-map for which the key matches "hello" ...
auto range = myMultiMap.equal_range("hello");
for(auto& it: range) { ... }

Use of the 'auto' keyword

Note that all of the above examples make use of the auto keyword (described in the previous post).

Without the auto keyword, you'd have to specify the full type, which in some cases you don't really want to have to do (i.e. pair<int, set<std::string> > when iterating the map - this is really an implementation detail I don't want to care about - like where with old C++ all I needed to know was that I could use loopVariable->first and loopVariable->second to get the key+value, here all I need to know is that I can use loopVariable.first and loopVariable.second to get the key+value).

Thus, the use of the auto keyword in in this situation should generally be preferred.

The type of the loop variable?

Previously when iterating the old C++ way, your loop variable was an iterator, which then has to be dereferenced (as if it were a pointer) to get the value (which can be one example of a potentially confusing use of C++ operator overloading, because the iterator really is more than just a pointer to the value).

Now when using the new 'range-based for' loop, what you get is just the value - you're actually getting the dereferenced iterator, that is, the following two loops are basically equivalent:

for(vector<int>::iterator it = v.begin(); it != v.end(); ++it) {
   auto value = *it;
   ...
}
for(auto value: v) {
   ...
} 

Thus for example, in the case of iterating a std::map<KEY,VALUE>, note that this loop variable is going to be a std::pair<KEY, VALUE> (just like when you deference the iterator when iterating a map in old C++ you are returned a pair).

By value or reference?

Just like you would do when dereferencing the iterator in the above example, you can choose whether you want to work with a value, a reference, or a const reference, i.e.:

// loop-variable is a value
for(auto i: arr) { printf("%d\n", i); }

// loop-variable is a reference, so can change it - this will give a compile error if arr is a const
for(auto& i: arr) { i += 1; } 

// loop-variable is a const reference - can't change it
for(const auto& i: arr) { printf("%d\n", i); } 

Generally, the following guidelines apply:

  • use a reference - auto& - when you want to make changes.
  • prefer to use a const reference - const auto& - if the object incurs any copying penalty.
  • if neither of the above apply, whether you use by value (auto) or by const reference is a matter of preference. If in doubt, const reference should be the safer choice.

Limitations

There are some limitations when using the range-based for-loop:

  • If you're used to iterating arrays by index, you no longer have that index if you need it for something (i.e. avoiding the trailing comma when outputting a comma-separated list, or just outputting the first N elements). You can of course track separately a count, but then you lose some of the conciseness gains.
  • If you want to iterate a collection backwards, there's no simple standard way in C++11 to do that with the 'ranged-based for'. You can use boost, which provides an adaptor so that you can do for (int y : boost::adaptors::reverse(x)), or you can create such an adaptor yourself. Or you can just use the old syntax with rbegin() and rend() in this case and at least still benefit from using auto.

Using 'range-based for' on your own collection classes

One area where things do get a little complex is in adapting your own custom classes to work with the new 'range-based for' loop. While this is not at all a new issue in C++11, in my case at least - in implementing my own collection classes - I had not bothered to implement iteration support very often, or at least not entirely properly conformant iteration support.

This was because, firstly, often using the iterators with the old syntax was simply not worth the hassle, i.e. if I have to choose between:

for(MyNamespace::MyVectorType<int>::iterator it = myVector.begin(); it != myVector.end(); ++it) {
   total += *it;
}

and

for(int i=0; i<myVector.size(); i++) {
   total += myVector[i];
}

I would anyway typically choose the latter for conciseness reasons.

Even if I had some collection which was not iterable so easily (i.e. it doesn't contain an underlying array and so can't be iterated by index), then there was no need to work out how to make them exactly conform to the specification - as long as there was some way defined to loop over the collection, which ideally but not neccessarily looked similar to the way the standard STL classes did it.

However with C++11, implementing proper iteration support becomes much more worth doing, and thus can be worth the dive into the understanding what the various ways are that classes can be extended to support iteration in this style.

Adding begin() + end() class member functions

The simplest way is to define begin() + end() member functions. The variable returned by these needs to:

  • be incrementable, such that incrementing the value returned by begin() will eventually result in a value that matches the value returned by end()
  • return a sensible value corresponding to the when operator* is applied to it
  • if creating an custom iterator class for this variable, to fulfil this, the following three operators need to be fined:
    • the prefix increment operator - T& operator++()
    • the != comparison operator - bool operator!=(const T& t)const
    • for dereferencing to return the value - T& operator*()

One thing to note - begin() and end() will be called just once at the start of the loop, rather than every loop iteration.

For something like an array, it may be as simple as returning pointers, i.e. the following code works:

template<class T>
class MyArrayWrapper {
   T* data;
   int size;

public:
   int* begin() { return size>0 ? &data[0] : nullptr; }
   int* end()   { return size>0 ? &data[size-1] : nullptr; }

   ...
};

MyArrayWrapper<int> arr;
...
for(int i: arr) {
   ...
}

For classes where simply returning a pointer (like in the above case) won't work, you generally need to define an iterator class, which fits the same needs mentioned above, i.e.:

template<class T, int SIZE>
class MyCircularBuffer {
   T* data;
   int beginPosition;
   int endPosition;

   class Iterator {
      T* data;
      int position;
   public:
      Iterator(T* _data, int _position):data(_data),position(_position) {}

      T& operator*() { return data[position]; }
      Iterator& operator++() { if(++position == SIZE) position = 0; return *this; }
      bool operator!=(const Iterator& it) const { return position != it.position; }
   };

public:
   Iterator begin() { return { data, beginPosition }; }
   Iterator end()   { return { data, endPosition }; }
};

MyCircularBuffer<int, 256> buf;
for(int i: buf) {
   ...
}

Just one problem ....

The above would be the end of the story. However, the above code will now not be able to iterate when the variable is a const, i.e. the following code won't compile:

   MyCircularBuffer<int, 256> buf;
   const auto& constBuf = buf;
   for(int i: constBuf) { // FAILS to compile - can't call non-const member function begin() on const object! 
      printf("%d", i); 
   }

One easy (but not correct) way to 'fix' this problem is to make the begin() and end() methods const, i.e. in MyCircularBuffer alter the method definitions to be:

   Iterator begin() const { return { data, beginPosition }; }
   Iterator end()   const { return { data, endPosition }; }

You will now be able to happily iterate both const and non-const variables. And many people (and most examples on the web) would happily stop there, as the above code is sufficient for a perfectly usable solution. However there is still one problem if we want to do things properly and use const. Take the following code example:

   MyCircularBuffer<int, 256> buf;
   const auto& constBuf = buf;

   for(int i: constBuf)        { printf("%d", i); }   // this will now run ok, which is good
   for(const int& i: constBuf) { printf("%d", i); }   // this will also now run ok, which is good
   for(int& i: constBuf)       { printf("%d", i++); } // this will compile+run, but it shouldn't!!

   std::vector<int> v;
   const auto& constV = v;

   for(int i: constV)        { printf("%d", i); }   // this is ok
   for(const int& i: constV) { printf("%d", i); }   // this is also ok
   for(int& i: constV)       { printf("%d", i++); } // FAILS to compile - can't iterate a const 
                                                    // collection with non-const ref!

The above code demonstrates that while we can now iterate our collection when it's being referenced by a const variable, it's also now possible to modify the contents of a MyCircularBuffer const object. We confirm above that this really shouldn't be possible by trying the same with std::vector, with which we get a compile error.

The full solution, with both const + non-const iteration

To fix the problem we first remove that naughty const that we just applied to our begin() and end() functions' signatures, and now we separately define a const iterator. The final result has a little more boilerplate, but can now handle iteration over both const + non-const objects correctly:

template<class T, int SIZE>
class MyCircularBuffer {
   T* data;
   int beginPosition;
   int endPosition;

   class Iterator {
      T* data;
      int position;
   public:
      Iterator(T* _data, int _position):data(_data),position(_position) {}

      T& operator*() { return data[position]; }
      Iterator& operator++() { if(++position == SIZE) position = 0; return *this; }
      bool operator!=(const Iterator& it) const { return position != it.position; }
   };
   class ConstIterator {
      T* data;
      int position;
   public:
      ConstIterator(T* _data, int _position):data(_data),position(_position) {}

      const T& operator*() const { return data[position]; }
      ConstIterator& operator++() { if(++position == SIZE) position = 0; return *this; }
      bool operator!=(const ConstIterator& it) const { return position != it.position; }
   };

public:
   Iterator begin() { return { data, beginPosition }; }
   Iterator end()   { return { data, endPosition }; }

   ConstIterator begin()const { return { data, beginPosition }; }
   ConstIterator end()  const { return { data, endPosition }; }

};

The only difference in the const iterator is that we define const T& operator*() const instead of T& operator*().

Extending existing classes with begin() + end()

It's also possible to extend classes by creating these begin() and end() functions not as member functions, but rather as free functions which are in the same namespace as the class. For example, MyCircularBuffer didn't define any begin() or end(), then the following would work (at least, if the member variables are made accessible):

template<class T, int SIZE>
class Iterator {
    T* data;
    int position;

public:
    Iterator(T* _data, int _position):data(_data),position(_position) {}

    T& operator*() { return data[position]; }
    Iterator& operator++() { if(++position == SIZE) position = 0; return *this; }
    bool operator!=(const Iterator& it) const { return position != it.position; }
};

template<class T, int SIZE> Iterator<T, SIZE> begin(MyCircularBuffer<T,SIZE>& buf) { 
   return { buf.data, buf.beginPosition }; 
}
template<class T, int SIZE> Iterator<T, SIZE> end(MyCircularBuffer<T,SIZE>& buf) { 
   return { buf.data, buf.endPosition }; 
}

template<class T, int SIZE>
class ConstIterator {
    T* data;
    int position;
public:
    ConstIterator(T* _data, int _position):data(_data),position(_position) {}

    const T& operator*() const { return data[position]; }
    ConstIterator& operator++() { if(++position == SIZE) position = 0; return *this; }
    bool operator!=(const ConstIterator& it) const { return position != it.position; }
};

template<class T, int SIZE> ConstIterator<T, SIZE> begin(const MyCircularBuffer<T,SIZE>& buf) {
   return { buf.data, buf.beginPosition };
}
template<class T, int SIZE> ConstIterator<T, SIZE> end(const MyCircularBuffer<T,SIZE>& buf) {
   return { buf.data, buf.endPosition };
}


MyCircularBuffer<int, 256> buf;
for(int i: buf) {
   ...
}

One last possibility

Note, that as a third option it's also possible to achieve exactly the same thing by specializing std::begin() or std::end() in the same way.

Monday, November 12, 2012

New syntax - 'auto'

The 'auto' keyword is a great and simple new tool for reducing verbosity in C++11, and a good feature to start this blog with, as it works well together with a lot of the other C++11 new features we'll be covering later, such as the ranged-for-loop and lambdas. It's also well supported in all the main C++ compilers already - clang (2.9+above), gcc (4.4+above), MSVC (since VC10).

The basic idea of auto is that instead of explicitly defining the type when declaring for a variable, you rather simply assign it a value/expression, and let the compiler infer the type based on what is assigned:

Old C++:

int i = 1;
FILE* fp = fopen("file.txt", "r");
std::map<std::string,std::vector<int> > myMap = fetchMyMap();
std::map<std::string,std::vector<int> >::iterator it = myMap.find("hello world");
for(std::map<std::string,std::vector<int> >::iterator it = myMap.begin(); it != myMap.end(); ++it) {
   const std::vector<int>& v = it.second();
   ...
}

C++11:

auto i = 1;
auto* fp = fopen("file.txt", "r");  // the * here is optional
auto* myMap = fetchMyMap();         // the * here is optional
auto it = myMap.find("hello world");
for(auto it = myMap.begin(); it != myMap.end(); ++it) {
   const auto& v = it.second();
   ...
}

More than just Syntactical Sugar

While auto at first may appear to be little more than trivial syntax sugar, especially if just considering examples like the first two examples above, it becomes more than just syntax sugar in some cases like the map and map::iterator examples above, i.e. particularly when:

  • the typenames are very long, such as when you start using C++ STL collections and their iterators, or other complex template classes (and when the repeating of these long names seems unneccessary and repetitive)
  • the type is really not clear, such as when using lambdas in C++11, i.e. auto x = [](int i) { return i > 0; }; - here for example most programmers would have no idea what this exact type is, and rightly shouldn't have to care what the type is. Sometimes the exact type that is returned is a complicated expression that is really an implementation detail of the library being used, rather than something that the user should have to know and type out (as long as the behaviour of that type - how they should make use of it - is understood).

Still statically typed

The beauty of auto is that it still retains the strong type-safety of C++ - all of these auto variables are still just as strongly typed, the compiler will warn like usual at compile time if they are incorrectly used as another type, there is zero performance impact as the code compiles to exactly the same thing as if the type had been specified, and the auto-complete in the IDE will know what their type is.

Thus, in using auto, you don't sacrifice the any of C++'s performance as compared to dynamically-typed languages, and yet you still gain the reduction in verbosity from not having to repeat typenames everywhere that was previously generally limited to such dynamically-typed languages (although you still have the option of specifying the type when preferred).

When to use 'auto'?

So when should one choose to use auto and when not?

One limitation to note is that you can't use auto on class member variables, but aside from this limitation, Herb Sutter recommends using auto wherever possible - "If you want to explicitly enforce a type conversion, that’s okay; state the target type. The vast majority of the time, however, just use auto".

I generally agree with this sentiment, although for simple types like bool, int, float, double etc I see relatively little value in using it most of the time.

I also only choose to use it when variables are being assigned the return value of some function or another variable (such as in most of the examples that were shown above), as there is little gain in using it when simply initializing an object with some variables passed to its constructor, i.e. in converting something like string s("hello") to auto s = string("hello").

However, as I will blog about later, modern C++11 code-style is evolving to prefer initializing variables with the results of functions in some cases where in old C++ one would have initialized them by calling their constructor, thus making the use of auto in such cases make more sense. i.e. instead of doing:

std::shared_ptr<std::string> str(new std::string("hello world"));

the new modern C++11 style, which also happens to be both more concise (removing the need to write the pointed-to-type twice), more efficient (requiring only one memory allocation instead of two), and which eliminates the use of the new operator entirely (removing new + delete usage is another part of modern C++11 style) is to prefer:

auto str = make_shared<std::string>("hello world");

This style, and the C++11 features which support this, will be covered later in this blog.

Which form to use: auto, auto*, auto&, const auto&, ...?

Like many things in C++, nothing is ever quite as simple as it first appears.

When I started using auto in my projects, I realized I was still unclear as to whether you have to specify whether the auto variable is a pointer / reference / const, something most introductions to this feature fail to explain. Looking at the specification clarified things:

Generally the logic is:

  • you may use auto& + const auto to force the variable to be a reference + const respectively (though obviously only doing so when it makes sense - you usually don't want to assign a temporary to a reference for example).
  • you may use auto* when the expression being assigned is actually a pointer, however you can't use auto* to force it to be a pointer like with auto& + const auto, just like you couldn't say int* x = some_int;
  • just using auto always works - if you just declare a plain auto variable, then it will automatically be a pointer / reference / const whenever the thing being assigned is.
    • thus, when you're not wanting to force the auto variable to & / const, the choice of whether to specify * / & / const is similar to the choice of whether to specify the actual type - it's somewhat down to personal preference.

I haven't touched on the logic for using auto with rvalue references (&&) here, as I haven't come to blogging about them (and move constructors) yet, but will get to that eventually.

Some examples to make these rules clear ....

const auto

The auto variable will be const if the thing being assigned to it is const, however you can also specify const auto to force it to be const.

const int fn1();
int fn2();

auto a = fn1();       // a is const
const auto b = fn1(); // b is const (the const here is optional but redundant)
auto c = fn2();       // c is not const
const auto d = fn2(); // d is const

auto&

The same logic applies as with const - if the thing being assigned to the auto variable is already a reference, then the auto variable will be a reference, so explicitly adding an & after auto is purely optional. However you can also specify auto& to force the variable to be a reference (with the usual caution around not assigning a reference to a temporary).

int& fn1();
int fn2();

auto  a = fn1(); // a is a reference
auto& b = fn1(); // b is a reference (the & here is optional but redundant)
auto  c = fn2(); // c is a value
auto& d = fn2(); // compile error - we cannot bind a non-value to a non-const reference
const auto& e = fn2(); // but apparently this is ok, don't ask me why or when one would use it...??

auto*

Like with const and &, if the thing being assigned to the auto variable is a pointer, the variable will be a pointer. However trying to assign a non-pointer to an auto* variable will rightly result in a compile error.

int* fn1();
int fn2();

auto  a = fn1(); // a is a pointer
auto* b = fn2(); // b is a pointer (the * here is optional redundant)
auto  c = fn3(); // c is a value
auto* d = fn4(); // compile error