std::vector::reserve + std::vector::push_back in loop is sub-optimal, because push_back needs to check for re-allocation, but that never comes.
std::vector::resize + std::vector::operator[] in loop is also sub-optimal, because resize default-initializes all elements only to be overwritten soon anyway.
This article's author suggests push_back_unchecked.
I suggest std::vector::insert with pair of random access iterators with custom dereference operator that does the "transform element" or "generate element" functionality. The standard will have resize_and_overwrite hopefully soon.
Couldn't this be solved by having push_back being an inline function (or at least the check on capacity being inlined and the rest of the non-trivial part being in a sub non-inline function)?
I don't know about C++, but in Rust the push is inline, and still doesn't always optimize checks away due to an annoying edge case: integer overflow. Reserving (old_len + new_len) could give you a smaller buffer than new_len. The optimizer sees it and is pedantic about it.
In C++ integer overflow is UB so this edge case cannot exist
Only signed overflow. size_t is unsigned.
That's totally right but I thought you were talking about signed numbers since you said “integer overflow”. I forgot that len is usually unsigned in C++.
Ok, but why did you not use emplace?
emplace controls the construction of the object added to the collection. It's also important but not related to the problem exposed by OP which is “how to remove the capacity check when we know statically that there is enough space”.
The benchmark looks off. The msvc one may be the only one vaguely reliable. I suspect clang and GCC were able to optimize the synthetic benchmark to a little more than a loop doing additions. At 96ns for 1000 iteration, you are looking at 10G iterations per seconds. Which can only be achieved by a loop of two instructions executing at 2 inst/s on a 5GHz processor. And you will not get a 5x just for removing a highly predictable branch.
So yes, std::vector leaves performance on the table, but no more than 10~15% for trivial loops that are not that uncommon but are rarely a bottleneck.
Then you have to ask yourself, is it worth it to add yet another function that can crash your program if misused just for that 10% in a situation where they might not even matter. I mean, I know, it's c++, zero cost abstraction, yadi yada, but if you're looking for consistent performance you should have moved away from the STL already. As this post shows, your STL vendor already has a huge impact on the performance, and there are widely available options to optimize specific cases.
So I'd rather keep the STL fairly simple. Add one function to work well with generators/iterators that have a known size if you want, but adding unchecked versions of every insertion function of every STL container is not worth it IMO.
Then you have to ask yourself, is it worth it to add yet another function that can crash your program if misused just for that 10% in a situation where they might not even matter
C/C++ already exposes a ton of undefined behaviors: it is part of the language to give full control to the programmer. If you want a language that minimizes the number of undefined behaviors you can get into, C/C++ is not the right candidate anyway. Something like Ada or Rust is much more relevant for that.
So I would say yes, just as long as it is properly documented.
I disagree. The question is not really "should we give programmer more power at the cost of yet another UB" but more "should we grow the API and add another UB for the select few for whom it might matter". When you consider choices made on other parts of the STL, such as std::unordered_map, then you realize the STL is not about being the most performant things around, but rather a collection of reliable tools covering basic usage for 80% of the user base.
With that in mind, I am against adding yet another function, which has its pitfalls, for minimal benefits. Again, such a function would be made almost entirely obsolete by a safe function that works with iterators/generators of known sizes. So I see even less benefit in adding a function that will just become yet another liability down the line.
std::vector::reserve + std::vector::push_back in loop is sub-optimal, because push_back needs to check for re-allocation, but that never comes.
std::vector::resize + std::vector::operator[] in loop is also sub-optimal, because resize default-initializes all elements only to be overwritten soon anyway.
This article's author suggests push_back_unchecked.
I suggest std::vector::insert with pair of random access iterators with custom dereference operator that does the "transform element" or "generate element" functionality. The standard will have resize_and_overwrite hopefully soon.
Moar discussion:
https://codingnest.com/the-little-things-the-missing-performance-in-std-vector/
https://twitter.com/horenmar_ctu/status/1695823724673466532
https://twitter.com/horenmar_ctu/status/1695331079165489161
https://www.reddit.com/r/cpp/comments/162tohr/the_little_things_the_missing_performance_in/
https://www.reddit.com/r/cpp/comments/162tohr/the_little_things_the_missing_performance_in/jy21hgd/
https://twitter.com/basit_ayantunde/status/1644895468399337473
https://twitter.com/MarekKnapek/status/1645272474517422081
https://www.reddit.com/r/cpp/comments/cno9ep/improving_stdvector/
Couldn't this be solved by having
push_back
being an inline function (or at least the check on capacity being inlined and the rest of the non-trivial part being in a sub non-inline function)?I don't know about C++, but in Rust the push is inline, and still doesn't always optimize checks away due to an annoying edge case: integer overflow. Reserving (old_len + new_len) could give you a smaller buffer than new_len. The optimizer sees it and is pedantic about it.
In C++ integer overflow is UB so this edge case cannot exist
Only signed overflow. size_t is unsigned.
That's totally right but I thought you were talking about signed numbers since you said “integer overflow”. I forgot that
len
is usually unsigned in C++.Ok, but why did you not use emplace?
emplace
controls the construction of the object added to the collection. It's also important but not related to the problem exposed by OP which is “how to remove the capacity check when we know statically that there is enough space”.The benchmark looks off. The msvc one may be the only one vaguely reliable. I suspect clang and GCC were able to optimize the synthetic benchmark to a little more than a loop doing additions. At 96ns for 1000 iteration, you are looking at 10G iterations per seconds. Which can only be achieved by a loop of two instructions executing at 2 inst/s on a 5GHz processor. And you will not get a 5x just for removing a highly predictable branch.
So yes, std::vector leaves performance on the table, but no more than 10~15% for trivial loops that are not that uncommon but are rarely a bottleneck.
Then you have to ask yourself, is it worth it to add yet another function that can crash your program if misused just for that 10% in a situation where they might not even matter. I mean, I know, it's c++, zero cost abstraction, yadi yada, but if you're looking for consistent performance you should have moved away from the STL already. As this post shows, your STL vendor already has a huge impact on the performance, and there are widely available options to optimize specific cases.
So I'd rather keep the STL fairly simple. Add one function to work well with generators/iterators that have a known size if you want, but adding unchecked versions of every insertion function of every STL container is not worth it IMO.
C/C++ already exposes a ton of undefined behaviors: it is part of the language to give full control to the programmer. If you want a language that minimizes the number of undefined behaviors you can get into, C/C++ is not the right candidate anyway. Something like Ada or Rust is much more relevant for that.
So I would say yes, just as long as it is properly documented.
I disagree. The question is not really "should we give programmer more power at the cost of yet another UB" but more "should we grow the API and add another UB for the select few for whom it might matter". When you consider choices made on other parts of the STL, such as std::unordered_map, then you realize the STL is not about being the most performant things around, but rather a collection of reliable tools covering basic usage for 80% of the user base.
With that in mind, I am against adding yet another function, which has its pitfalls, for minimal benefits. Again, such a function would be made almost entirely obsolete by a safe function that works with iterators/generators of known sizes. So I see even less benefit in adding a function that will just become yet another liability down the line.