Posts Tagged ‘stl’

Choosing an STL Sequence Container

Saturday, June 9th, 2012

The C++ Standard Template Library has a fairly good selection of container classes, and in particular “sequence” containers. For some reason (I’ve always assumed it’s familiarity), most people seem to use std::vector as their container-of-choice, but experience shows that it’s actually the best choice less often that one would think.

So how do you pick the right sequence container? Often, it’s not an easy or obvious choice. Each container has different trade-offs : there’s the obvious difference of some provide features that others don’t – but there are also the more subtle differences in memory allocation characteristics, performance guarantees, and some perhaps less easily quantifiable characteristics (such as cache locality).

This is based on something I wrote a little while ago when corresponding with my friend & colleague Luca Bolognese on his recent series of articles on functional programming in C++ (which is well worth a read).

Just as there’s no “one size fits all” container (and, by inference, no “one size fits all default” container), there’s no one size fits all mechanism for choosing a container (although profiling can help a lot) – but this is roughly how I choose a container (before I can profile, naturally!).

A decision tree for selecting a sequence container:

  • I’m in a rush and can’t be bothered to read the rest: use std::deque
  • I know at compile time exactly how many elements will be needed (and will they all fit on our stack!) – If so, use std::array.
  • I need constant-time random access (it’s worth noting that we often *think* we do, but actually don’t – YAGNI applies). If you DO need it, then we can eliminate std::list/std::forward_list as candidates.
  • I need bidirectional iteration (again, we need this less often that we thing). If so, eliminate std::forward_list from consideration
  • Will elements often be added/removed “in the middle” (i.e. not at the ends)? std::forward_list or std::list are probably the best choices. (Unless you eliminated them earlier due to constant time random-access requirements); these two are particularly useful when the contained elementes are expensive to move/copy, since they don’t need to shuffle things around when you add or remove elements. Of course, it may be possible to reduce the cost of move/copy by having a container of (preferably smart) pointers, rather than directly containing the objects.
  • Do we need the contents as a contiguous array? use std::vector (and call reserve with a reasonable guesstimate beforehand if we can) – sometimes, anyway; I’ve seen cases where it’s faster to build my collection as a std::deque, then transform to std::vector later on.)
  • Use std::deque.

Or, to put it another way, “default to std::deque unless there’s a good reason not to”.  That’s an overly-strong statement, but it’s a reasonable starting point. In a typical application, deque will give you better memory allocation characteristics than any of the others (aside from std::array, but that’s extremely limited in its application). vector can beat it IF you have a good estimate for how big the collection is going to get (but otherwise can tail off dramatically). deque will typically give you better cache locality than list/forward_list, and it’s insert/remove at the ends characteristics are better than any of the containers except the two list types.

Naturally, every case will be different – and no set of ‘rules of thumb’ is going to work every time (this is one of the things which distinguish rules of thumb from laws of physics). Commonly, we don’t know ahead of time which of those criteria is going to have the most significant impact on what we’re doing; when I find myself in that situation, I’ll use std::deque; once I have a firmer idea of the criteria, I’ll go back and revisit that choice (which – quite often – is as simple as changing a typedef). And of course – as with all ‘optimisations’ – measure, measure and measure again!