By nature, we are predisposed to make comparisons as soon as we get the chance and, of course, this is particularly true in the world of computing. Which, as we well know, is dominated by numbers, not only those that are processed by our computers or devices, but also those that we use to classify them in some way (Ghz, core, GB, …).
The beating heart of the ‘scrap metalware’ that we use, whatever type it may be, is the processors, which, despite the fact that in recent years they have taken on a minor role compared to other coprocessors and computing units that have been flanking them (GPUs
, DSPs
, and recently also accelerators for AI), still retain a central role (not least because they continue to coordinate all these faithful servants).
History has handed us a remarkable amount of processors of all types, with their merits and flaws, strengths and weaknesses, so it was quite natural to ‘compare’ them. Which is something we also did in these pages several years ago, with a couple of articles that I like to recall as they are representative of what kind of ‘measurements’ and comparisons were (and still are):
Is a 6502 at 1Mhz or a Z80 at 3.5Mhz faster? (In Italian. Sorry)
From the myth of Mhz to the myth of bits… (In Italian as well)
Precisely in the area of processors, there is a diatribe that has been going on for more than 40 years and that pits two different types, called RISCs
and CISCs
, against each other, which, however, it would perhaps be more correct to call macro-families (in some cases, where the context is intelligible, I will refer to them more generally as ‘families’) of architectures (and, in part, micro-architectures as well. More on this in another article), as a given processor belongs to only one of the two. They therefore form two distinct and separate partitions.
Different levels of abstraction
Before getting to the heart of the diatribe, however, it is very important to point out that talking about RISCs
and CISCs
concerns a certain level of abstraction. Even more clearly, we are talking about the top of the ‘pyramid’, i.e. the highest (and most abstract, therefore) level: the macro-family, to be precise.
Moving down a notch we find, of course, the family (of processors). An example of this is the well-known x86
and ARM
, representatives of CISCs
and RISCs
, respectively (although there is much to be said about ARM
).
Finally, and going down another step of the pyramid, we find the last level of abstraction which is represented by the family member. Here, too, an example is the Atom N450
and Cortex-A9
, which are obviously part of x86
and ARM
.
In reality, a further level of abstraction could have been inserted between the second and third, which would have allowed the categories to be enriched and better differentiated. For example for x86
we could have introduced 8086
, 80186
, 80286
, 80386
, …, x86-64
as ‘base’ architectures, with the Atom N450
being, therefore, a member of x64
. The equivalents for ARM
would instead be ARMv1
, ARMv2
, ARMv3
, …, ARMv9
, Cortex-A
, Cortex-M
, Cortex-R
, with Cortex-A9
being part of Cortex-A
, of course.
I prefer, however, to ignore this other level of abstraction, because it makes no contribution to the objective of the article. Which is, instead, aimed at highlighting a concept that should be obvious, but often is not at all: speaking at a certain level of abstraction does not entitle one to draw conclusions about higher ones. This is clearly a logical fallacy, although it is quite common, unfortunately.
How to compare: remove all variables except one
Another not insignificant fact is that, for some time now, there have been no ‘bare bones’ processors. In fact, what ends up on motherboards is classified as a CPU, which can also integrate more than one processor. But even if you only had one processor, there would also be other elements called ‘uncore‘ in the jargon. Then there are also external peripherals, which can also influence performance in tests or what can be integrated in the core (because there will be fewer resources available).
As can be seen, trying to make comparisons even at the same level of abstraction (the lowest one, which identifies a precise member of a family) is extremely difficult precisely because, in addition to the individual processor, there is much more that should be taken into account. The ISA
+ microarchitecture + production process triplet (which identifies a processor for a specific chip) is therefore no longer sufficient to make comparisons, which would be ‘flawed’ by the above.
Ideally, when comparing two CPUs, one would like to eliminate these differences by having exactly the same elements for both, with the only difference being the different cores. Unfortunately, this is now practically impossible.
The last processors to allow this kind of comparison were the x86
processors using Intel’s very famous socket 7 (socket 8 had a very limited adoption): just change the CPU, put in another one, and repeat the same tests using exactly the same platform.
Comparisons on the same platform and level of abstraction (the third): the family member — The user level
At this point, one might ask whether it still makes sense to make comparisons even if one assumes a common platform (same socket/environment), because one can have completely different CPUs (at the level of elements inside) even though they are part of the same architecture/ISA
.
Would you compare a two-way in-order CPU with another two-way but out-of-order CPU, even though it is in the same family? It might even make sense, but of course it would depend entirely on what you wanted to achieve with this type of testing and measurement.
Even if they have two in-order CPUs of the same family, for example, they could in any case have completely different elements of the uncore (L2 & L3 cache, embedded memory controller, embedded PCI/PCI-Express controller, memory soldered onto the package, etc.), as well as different elements of the core itself (L0/micro-op cache, L1 cache for code and/or data, jump predictor, etc.). ). Which, needless to say, influence benchmark results.
Users do not, in any case, have these philosophical problems, as they have the possibility of checking the concrete products that are on sale (by looking at the myriad of their reviews or by carrying out the tests themselves) and then choosing those that best suit their specific needs. Problem solved, at least for them!
Comparisons on the second level of abstraction: the ISA
— The geek level
Possible comparisons remaining on the plane of the second level of abstraction (the ISA
) relate to the overall size of the generated code, the number of instructions in it (and, thus, the average instruction length can also be calculated), and the number of memory references (load and/or store).
Performance is out of the question, because it belongs to the already dealt with third level of abstraction (if that makes sense: see above): concrete products are, in fact, needed (even simulations might be OK, if they can accurately reproduce chip operation) to measure it!
But even here, other issues arise. The benchmarks, in fact, depend on the compiler used, its version, how well the backend has been developed for the specific architecture, and how optimised the built-in/system libraries are, just to give a few examples. The ABI
used by the compiler and the s.o. must also be considered, as it also influences the generated code.
Finally, one would also have to consider what kind of code one wanted to generate. Optimised for size? For speed? And, if so, would it still make sense to control the size of the executables? If for speed, at what level of optimisation? Using PGO
? LTO
?
All this without even considering the inherent differences of specific microarchitectures (and, thus, again of specific, concrete products), which compilers absolutely must take into account in order to generate better code.
Finally, it must also be taken into account that the sources of some applications may also contain assembly code and/or special intrinsic functions and/or pragmas, all of which influence, once again, the quality of the generated code and/or performance.
As you can see, there are so many possibilities that it is difficult to arrive at the classic ‘last word’ on the subject. Here too, as with the third level of abstraction, everything depends on what one would like to measure and, above all, achieve. All while remaining in the purely technical/geeky realm: nothing of interest to users…
By this I do not mean that one should refrain from making comparisons, but only that one should not think that one can arrive at blunt conclusions, despite having a lot of data at one’s disposal. A huge amount of benchmarks of different types of applications and running on different microarchitectures/chips can certainly be very useful to get a rough idea of the ‘potential’ of an ISA
, without claiming, by this, to arrive at a general and definitive conclusion.
Also because everyone might have their own, legitimate, idea about what would or should be important in an ISA
and/or a microarchitecture, and this would influence their judgement accordingly (due to the preconceptions they have established).
Comparisons on the first level of abstraction: the macrofamily — The philosopher level
One might well think at this point that, applying similar reasoning, it would not be possible or sensible to make comparisons between RISCs
or CISCs
CPUs/processors. In reality, this is a completely different story, as there are precise definitions of what a RISC
processor is and, au contraire (because the concepts are mutually exclusive), what a CISC
is.
More on this in the next article, which not only focuses precisely on their definitions, but is also the main reason I wrote this series (to shed light on the terms and, thus, on the other levels of abstraction).