AoS and SoA - Wikipedia

rmam@programming.dev to Programming@programming.dev – 14 points –
en.wikipedia.org
6

I found this article to be strangely interesting. Coming from object oriented programming, I tend to think about objects as independent, self-contained instances, even if they are ultimately stored in a container. However, once we take padding into account, the overhead of storing objects in an array can potentially become something worth to keep in mind.

I have worked with and build a SoA system. I quite like it. I worked with one written in C++ where your objects were represented by small reference structs and you access all of the real members via static methods. It was done to improve cache access times as often you iterate over a large number of objects but read only a single property (say only the position) of each object. I don't know how big a performance improvement this actually is, as we don't have a feature-parity version of AoS lying around. But taken by itself the SoA does not feel less comfortable to work with. Though we make heavy use of a code-generator to not write getter/setter boilerplate.

array of structures (AoS), structure of arrays (SoA) or array of structures of arrays (AoSoA) are contrasting ways to arrange a sequence of records in memory, with regard to interleaving, and are of interest in SIMD and SIMT programming.

Its not so easy.

GPU-programmers are the expert in AoS vs SoA formats. And when you look at how RGB values are stored, its... incredibly complex. Sometimes you've got RRRRGGGGBBBB, sometimes its RGBARGBARGBA, sometimes its YYYYUUVV. What's best for performance changes dramatically on system-to-system, requiring lots of benchmarking and ultimately... a massive slew of processor-specific / ARM NEON instructions that convert between every format imaginable.

On right, GPUs don't need that processor-specific instruction because permute and bpermute instructions exist (32-way crossbar any data-to-any-lane movement, and vice versa any lane pulling from any data, permute and bpermute respectively). CPUs do need it though.