Choosing an STL Sequence Container

The C++ Standard Template Library has a fairly good selection of container classes, and in particular “sequence” containers. For some reason (I’ve always assumed it’s familiarity), most people seem to use std::vector as their container-of-choice, but experience shows that it’s actually the best choice less often that one would think.

So how do you pick the right sequence container? Often, it’s not an easy or obvious choice. Each container has different trade-offs : there’s the obvious difference of some provide features that others don’t – but there are also the more subtle differences in memory allocation characteristics, performance guarantees, and some perhaps less easily quantifiable characteristics (such as cache locality).

This is based on something I wrote a little while ago when corresponding with my friend & colleague Luca Bolognese on his recent series of articles on functional programming in C++ (which is well worth a read).

Just as there’s no “one size fits all” container (and, by inference, no “one size fits all default” container), there’s no one size fits all mechanism for choosing a container (although profiling can help a lot) – but this is roughly how I choose a container (before I can profile, naturally!).

A decision tree for selecting a sequence container:

  • I’m in a rush and can’t be bothered to read the rest: use std::deque
  • I know at compile time exactly how many elements will be needed (and will they all fit on our stack!) – If so, use std::array.
  • I need constant-time random access (it’s worth noting that we often *think* we do, but actually don’t – YAGNI applies). If you DO need it, then we can eliminate std::list/std::forward_list as candidates.
  • I need bidirectional iteration (again, we need this less often that we thing). If so, eliminate std::forward_list from consideration
  • Will elements often be added/removed “in the middle” (i.e. not at the ends)? std::forward_list or std::list are probably the best choices. (Unless you eliminated them earlier due to constant time random-access requirements); these two are particularly useful when the contained elementes are expensive to move/copy, since they don’t need to shuffle things around when you add or remove elements. Of course, it may be possible to reduce the cost of move/copy by having a container of (preferably smart) pointers, rather than directly containing the objects.
  • Do we need the contents as a contiguous array? use std::vector (and call reserve with a reasonable guesstimate beforehand if we can) – sometimes, anyway; I’ve seen cases where it’s faster to build my collection as a std::deque, then transform to std::vector later on.)
  • Use std::deque.

Or, to put it another way, “default to std::deque unless there’s a good reason not to”.  That’s an overly-strong statement, but it’s a reasonable starting point. In a typical application, deque will give you better memory allocation characteristics than any of the others (aside from std::array, but that’s extremely limited in its application). vector can beat it IF you have a good estimate for how big the collection is going to get (but otherwise can tail off dramatically). deque will typically give you better cache locality than list/forward_list, and it’s insert/remove at the ends characteristics are better than any of the containers except the two list types.

Naturally, every case will be different – and no set of ‘rules of thumb’ is going to work every time (this is one of the things which distinguish rules of thumb from laws of physics). Commonly, we don’t know ahead of time which of those criteria is going to have the most significant impact on what we’re doing; when I find myself in that situation, I’ll use std::deque; once I have a firmer idea of the criteria, I’ll go back and revisit that choice (which – quite often – is as simple as changing a typedef). And of course – as with all ‘optimisations’ – measure, measure and measure again!

Tags: , ,

3 Responses to “Choosing an STL Sequence Container”

  1. [...] those of you interested in C++, Andy Sawyer now has a blog here. Here is an extended discussion on how to choose an STL container. Enjoy !! Like this:LikeBe the first to [...]

  2. `deque` introduces extra indirection and memory fragmentation, does not provide a contiguous storage guarantee, and is is less familiar to programmers. Its constant time for extraction from or insertion at ends is easily achieved by a `vector` by just wrapping it in a cursor gap thing (adding that extra indirection only). So apparently as a general default container choice it has nothing going for it, and some aspects go against it.

    Yet you write e.g. “In a typical application, deque will give you better memory allocation characteristics than any of the others” and “I’ve seen cases where it’s faster to build my collection as a std::deque, then transform to std::vector later on”.

    It would be nice if you could back this up with numbers and corresponding code.

    • andys says:

      Hi Alf,
      Thanks for your comment.

      To be clear, I’m not proposing that deque is the best container for all situations – I’m merely suggesting that there are times when it’s a better *default* choice than vector.

      I already address the contiguous storage issue above (if you need it, then use vector – or array).

      As for memory fragmentation, vector only beats deque if there are no reallocations; when vector starts realllocating, then it can cause considerably worse fragmentation. (Consider that when vector reallocates, it typically doubles the size of allocated storage – so to reallocate a vector of N elements, you need 3*N storage – 2N for the new vector, N for the one we’re copying).

      “Adding a cursor gap thing” as you put it possibly suffers from the lack of familiarity that you seem to think is a problem for deque (and further suffers from not being part of the standard library).

      With regards to building the collection as a deque, the specific case I’m thinking of are building a collection from a pair of input iterators; with deque, this is a linear operation – with a vector, it’s logarithmic. You can build a vector linearly from a deque – so the ‘build a deque/transform to vector’ is 2N; ‘build a vector’ can be faster than this – but can also be much, much worse.

      If your contained type is expensive to copy, those extra copy ctors can cost a lot.

      Of course, if you know ahead of time how many widgets you’re going to get, then you can call reserve – but I mention that above.

      I’ll throw some code together to demonstrate this when I have time (probably tomorrow).