Missed opportunities to improve the Amiga chipset – 6: the alternative of 16-bit innovations

Last month, a publishing project was announced by David Pleasance, one of the managers of Commodore UK (the UK subsidiary) who, in collaboration with some former engineers of the company, decided to write a book about an alternative view of history if, after the bankruptcy, the project he had set up to take it over had gone ahead and Commodore had been able to pick up where it left off, but giving a big boost to the more technical part of research & development.

Instead, I’d like to think about what could have been done if the employees of the parent company had done their homework properly and had at least decently evolved our acclaimed platform, instead of having given us the whole series of “innovations” and projects that were never realised that were discussed and analysed in the previous article.

Obviously, we are not talking about upward shootings, but of contextualised choices on what was available in a certain historical period, what the relative technological limits were, and what opportunities could reasonably have been realised in actual functionalities and/or advancements in certain hardware compartments. All this on the basis of certain stages/periods that have already been taken as a reference in the aforementioned article, and also taking into account the needs of programmers (who, in my opinion, are the ones best able to understand the potential of a system and what it lacks).

1987: the time of the Amiga 500 and 2000

Two years after the introduction of the first Amiga, the 1000, some changes had already been made to the original chipset (OCS), with the introduction of the EHB mode to display 64 colours (halving the intensity of the first 32 colours of the palette) in the latest revisions of the Denise chip (responsible for displaying graphics) that came with most 1000s, and of Fat Agnus for the later 500 and 2000 models, which supports 1MB of memory (though unfortunately divided into 512kB Chip and 512kB Slow, as we have already seen).

In this context, I think the following small improvements would have greatly benefited the platform:

  • two extra bitplanes (so going from 6 to 8) to extend the EHB mode to 128 and 256 colours (with 7 and 8 bitplanes, respectively) and the Dual Playfield mode to 16 + 15 colours (like the AGA);
  • sprites and Blitter capable of “mirroring” (flip / mirroring) the graphics horizontally;
  • Copper capable of copying data to multiple registers;
  • DMA for the colour palette (requires a new pointer for the data to be read, a 16-bit register for the register index from which to load the read data, and another 16-bit register for the number of 16-bit values to be read);
  • audio channel volume independently selectable left and right;
  • 1MB Chip memory support.

The 256 colours…

Almost all of them are very simple modifications, and some really trivial, but of great impact. The first, in particular, is the one that obviously has the greatest visual impact, because it greatly extends the number of colours that can be displayed simultaneously on a single screen (EHB) or with two screens (Dual Playfield), compared to the 64 and 8+7 colours of the original machine.

In particular, having the EHB at 128 and 256 colours, while maintaining the same palette of 32 basic colours (selected from 4096), is equivalent to being able to use 4 and 8 gradations, respectively, of each of these basic colours.

Nothing transcendental and, indeed, extremely simple, if we consider that the EHB mode was introduced on the fly in the last revision of the Denise chip that was used in the Amiga 1000 (it was absent in the very first models).

Moreover, already a few years earlier (1984, to be precise) we find exactly the same mechanism of 8 gradations (intensity/brightness) of a colour with some of the mother company’s own computers: Commodore 16/116 and Plus4.

These had a fixed palette of 16 colours, but it was possible to choose 8 different shades for each of them, for a total of 121 colours that could be displayed (black was the only colour that obviously could not benefit from this feature).

Also a contemporary of the Amiga 500 and 2000, the Archimedes by Acorn:

used a similar, though somewhat more convoluted, mechanism, but starting from a palette of only 16 basic colours and allowing the most significant bits of the three colour components to be set from the other 4 bits (one for red and blue, and two for green) that were specified in the 8 bits (which were then divided into two parts: 4 bits for the basic colour and the 4 bits with the highest bits of the three components).

Both approaches have their merits and drawbacks, but in the end they are both good enough to generate images of 256 colours from a palette of 4096, without paying the very high price of having to implement 256 registers internal to the video chip to maintain at least 12 bits (for the 4096 colours). Feasible in ’87 (a new production process was available), but still expensive for machines that were supposed to mainly attack the mainstream/home computer market.

… but with a price to pay

It’s not all doom and gloom, however, since increasing the number of colours also means having to process more data, and as demonstrated in the previous article this means that the performance of the Blitter would have to scale up at least linearly (at least 33% more, then) with the increase in bitplanes.

Which is quite difficult, as we have seen, because there are only two viable paths: either increase the size of the data bus, from 16 to 32 bits, or increase the clock frequency. Neither, however, was objectively viable at the time, as the costs would have been too high (the time was not yet ripe, and the main objective was to reduce the costs of the new platform: the 1000 was out of scale with respect to the market on which Commodore operated).

So the Blitter would have had to stay that way, and consequently programmers would have had to shoulder the burden of deciding how to develop games to take advantage of the graphics with all these colours.

For example, adventures would have certainly benefited greatly, because graphics would not have to be upgraded to 60 (NTSC) or 50 (PAL) frames per second, and thus the computational cost would have been sustainable. Other games would have to run at half FPS, but the Amiga already has several running at 30 or 25 FPS.

A similar discourse applies to the Dual Playfield mode, because the two 16 + 15 colour screens (4 + 4 bitplanes) require 33% more computing power to be managed than the classic 8 + 7 colours (3 + 3 bitplanes).

In any case, and in my humble opinion, the important thing was and remains the possibility of having these two innovations available: it would then be up to the technical staff to decide how to use them wisely, taking into account the aforementioned limitations.

“Compressing” data

The space taken up by graphics for chip memory has always been one of the biggest headaches programmers and artists have had to face, especially if the goal was to achieve games with significant visual detail and colour variation.

For this reason, the possibility of being able to display sprites or BOBs (the graphic objects displayed using the Blitter) “mirrored” horizontally would have greatly contributed to reducing their occupation, halving their space in several cases:

A change that was also trivial (often implemented even in much older consoles) that could have reduced the memory requirements for several games by not needing the expensive 512kB additional memory expansion (usually “Slow” memory) that has become almost standard for this very reason.

This does not detract from the fact that being able to support 1MB of Chip memory would have been another small, but significant, change that would have enabled much better and simpler games to be made, also making life easier for programmers, as already explained in previous articles.

Also with a view to saving not only precious Chip memory, but also the relative memory bandwidth consumed, we should read the changes to Copper to allow it to set multiple values in sequential registers, and the new DMA channel that serves the same purpose but is optimised to load values directly from the same memory without having to create long Copper Lists (that’s what the programs executed by this coprocessor are called).

Spatial audio (surround)

One thing that particularly annoyed me, as a programmer, was that I could only listen to an audio channel on the right or left, because when there was a sound effect to be played, I had to decide which music track to take out of the way and recycle for the purpose.

The problem was that, depending on the particular moment of the soundtrack, it might have been preferable to reuse one channel rather than another, while the choice that made life easier was to decide in advance, whatever the moment or the soundtrack, which track to replace if there was to reproduce an effect on the left or right, with certainly not optimal results.

The solution to solve this problem is simple, as already described in the first article in the series, but probably of all those listed so far is the one that requires a bit more transistors for its implementation, having to add four more PWM components (while no additional registers are required: just use the high byte of the register assigned to volume, so as to replicate it in order to adjust the other output).

In any case, nothing that was not feasible for the technology of the time, and which would also have contributed to the rethinking of some games, which with this functionality could have implemented the so-called “spatial” audio, i.e. giving the possibility of greater immersion and realism, because one could have heard a sound coming from far to the right and then moving towards the centre and then away to the left, just to give an example:

Which, if emulated with a normal Amiga, would obviously have required two audio channels dedicated to the purpose (one playing the sound on the right with a certain volume, and another on the left with another volume), thus leaving only two of the remaining audio channels to do other things.

That of only four audio channels available was another of the great shortcomings that plagued our platform, as we have already had ample opportunity to see in the various articles that have been written. It would have been very welcome, therefore, to remedy this when technology allowed it.

1990: the time of the Amiga 3000 and 500+

This could have been done in the next three years, when the Amiga 3000 was introduced, but it would have been even better to market a revised model of the 500, with much more advanced hardware, to better meet the challenges of the competition that had become extremely fierce, and this while remaining in the 16-bit sphere (32 bits were still the prerogative of much higher-end and more expensive machines).

Again, in my opinion and for this to have been possible, the new chipset should have brought the following improvements:

  • doubled frequency (14Mhz instead of 7) with related faster memories (going from 280 to 140ns) but still cheap compared to those available in ’90, and 68000 processor at 14Mhz (a 12.5Mhz overclocked or a 16Mhz downclocked);
  • graphics controller capable of displaying one 640×200/256 (NTSC/PAL; non-interlaced) screen up to 256 colours (as well as EHB and HAM), and two screens at 320×200/256 (also non-interlaced) at 256 colours but in packed/chunky format;
  • two palettes of 256 actual colours (one for each screen, and with the sprites being able to choose which one to use);
  • packed/chunky 256-colour (8-bit) graphics;
  • 14Mhz Blitter with support for 8-bit packed/chunky graphics (256 colours) and a mask’s cache up to 64×64 pixel areas;
  • 32 sprites, with the possibility of choosing part of the 256-colour palette to use;
  • 32 audio channels.

This is a lot of stuff, but it is mostly a completely natural evolution of the platform (and, thus, without any particular implementation complications), with only one “alien” addition (the packed/chunky graphics), but a necessary one (planar graphics are too inefficient and limiting, especially when dealing with so many colours or particular areas).

The trivial changes (for which few changes to the implementation are required) are obviously those that result from doubling the operating frequency of chipsets, memories, and CPUs, but they are also those from which the greatest benefits are derived, as pointed out in previous articles.

Using the new process node

Of a completely different magnitude would have been the introduction of no less than two colour palettes, each consisting of 256 16-bit elements (to store the components of each colour), which obviously requires a lot of transistors to implement.

The technology, however, had progressed quite a bit since the first Amiga came on the market, and would certainly have enabled it. Indeed, it must be remembered that the custom chips used a 5um manufacturing process, which was already very old in ’85, as Commodore’s LSI division already had the new 3um process available.

The numbers in themselves don’t say much, especially if you don’t have a basic knowledge in this field, but it is very easy to understand the significance of the advancement in manufacturing processes if you take into account the fact that going from 5um to 3um means, crudely, that the same transistors in a chip now occupy less space proportionally. Which, proceeding in the opposite direction, is equivalent to saying that a chip occupying a certain space at 5um would allow many more transistors to be integrated in the same, identical space, but with a 3um process.

Specifically, 5 / 3 = 1.67, so about 67% more transistors could be packed. But taking into account that this change has so far only affected one dimension (e.g. the horizontal one), while the chip surface is two-dimensional, the same increase would occur in the second dimension (the vertical one), so the two factors would multiply. In summary, the increase in transistors would be 1.672 = 2.78, or 178% more transistors in the equivalent chips, but at 3um.

Since Commodore already had a new 2um manufacturing process in place in ’88 (used in the new 65CE02 processor that found its way into the A2232 serial board for the Amiga, and later in the Commodore 65 project), it’s easy to do the maths and see how 525% more transistors would have been possible in ’90. I would say far more than enough to accommodate the demand to store the total 512 colours of the two palettes, and much more.

In fact, it also covers the cost in terms of transistors to get 32 sprites and 32 audio channels, even though in the latter case, the main way forward would have been to implement two circuits capable of mixing 32 audio channels, which would then be channelled into two high-precision DACs for left and right output respectively. Such a system is necessary in order to easily scale the number of audio channels, instead of having to add a DAC and two PWMs each time for each new audio channel. Moreover, if implemented with a little bit of cleverness, it would not require a lot of resources either (no need for 32 summers working in parallel, just to be clear).

Similarly, even the implementation of the mask cache used by the Blitter for the most complex (but among the most common) operations would have had no problem finding a place thanks to the avalanche of transistors available. This modification has already been explained in a previous article, and serves to avoid having to load the same mask each time for each bitplane to be processed, which is the “tax” to be paid each time due to the use of planar graphics (instead of packed/chunky). This would have saved about 25% bandwidth to memory, and similarly improved the performance of the Blitter (by exploiting a pipeline bypass, having the mask already being loaded into the cache).

Making use of the two new colour palettes

A decisive step forward from an exquisitely visual point of view would have been the possibility of having the aforementioned two colour palettes of 256 colours each. There are two because the Amiga also allows for two completely independent screens, using two separate parts of the single palette that has always been used for any graphic element (including sprites), so and in my humble opinion it would certainly have been much more useful to be able to separate them altogether also to be able to offer a greater chromatic richness without necessarily having to switch to the use of 16-bit packed/chunky modes.

Of course, it would also have been useful to be able to decide which of the two palettes to use for the two screens, and possibly which part (for screens with less than 256 colours). This was partly realised in the later AGA chipset with the BPLCON3 register for the two screens, to which a bit was to be added to control which of the two palettes to use for the second screen.

This was also implemented for the sprites, thanks to the BPLCON4 register, which allows selection of which part of the palette to use for the even-numbered ones, and which part for the odd-numbered ones. A solution which, in my opinion, is wrong, because it should have been possible to select the palette directly for each individual sprite, as is the case in several arcade or console systems, instead of globally for all of them (although separated into odd and even, in the AGA).

To do this would require changing the control register for each sprite, SPRxCTL:

using some unused bits for this purpose: bits 3 to 6. With these four bits it would have been possible to select the four highest bits for the part of the palette to be used (which can ideally be considered divided into 16 palettes of 16 colours each). Considering that the Amiga can display sprites with a maximum of 16 colours (only 4 with OCS/ECS, but would have become as many as 16 with this innovation, since the 32 4-colour sprites can be paired two at a time to form a 16-colour sprite), the combination is perfect.

In fact, some of these bits are used by the ECS chipset to be able to position the sprites more granularly horizontally:

and some even by the subsequent AGA:

but frankly speaking I find it far more useful to have the full freedom to choose the palette to be used individually for each sprite. Furthermore, the better horizontal granularity could have been realised in a later evolution of the chipset (as we shall see in the next article).

Finally, it would have been sufficient to exploit an unused bit in one of the registers (e.g. precisely BPLCON4) to select which of the two 256-colour palettes to use for all sprites. Then the indicated palette would have remained the same for all sprites, but this is a limitation you could have lived with easily.

Sleeping with the Enemy: packed/chunky graphics

We now come to the innovation I had previously classified as “alien”, namely the introduction of 8-bit packed/chunky graphics, both in the video circuitry that renders and sends the screen to the video output, and in the Blitter.

I have written extensively on this subject in the past, including an entire (and lengthy) series (in Italian) dealing with the subject in depth, as well as an article demonstrating (also going into detail) how it could have been implemented quickly and efficiently (again, in Italian) in place of the CD32’s ridiculous Akiko chip conversion circuitry, and all while remaining absolutely consistent with existing operation.

Little stuff and little effort in terms of implementation as well as in terms of the resources to be employed, therefore, also because it is only the 8-bit version = a single byte (which simplifies everything enormously), but which as a long-time developer I consider an absolutely necessary step to better support various scenarios as well as optimise the use of the (few) resources available.

On the other hand, Commodore has also realised this, and by introducing “interleaved” bitmaps with version 2.04 of its operating system, it has tried to put a patch, as far as possible, on the fact of having to process each single bitplane of a screen or image separately, with negative repercussions both in terms of performance and of wasted memory space and bandwidth.

The crux of the matter is, in fact, precisely this: to allow processing to be carried out in one go, without requiring as many passes as the number of bitplanes used by the screen’s colour depth, to which is added improved efficiency.

The latter is perhaps not immediately comprehensible to those not versed in the workings and limitations of the Amiga hardware in the purely graphic sphere, but an example should make the matter clearer:

If you analyse the “shots” in detail, you realise that the smallest ones are only three pixels wide, as shown in the following image (shot on the left):

Yet in the Amiga they are always stored using 16 bits (where one bit always corresponds to one pixel, for each bitplane that makes up the image), thus wasting a good 13 bits/pixel of space (multiplied by each bitplane). The same image in packed/chunky version would, on the other hand, occupy 4 bytes per row, with the first three bytes containing the colour of the three pixels, while only the last byte would be used as a filler to align the data to 16 bits (which is the size of the system bus).

Another advantage of this format is that it does not require the use of “masks” in order to draw an image. For simplicity’s sake, on the Amiga an additional bitplane is usually used with all the objects that need to be drawn, where a bit set to 1 signals that for that pixel the object graphic is to be drawn on the screen, while if it is 0 it will indicate that it is a “transparent” area, so that pixel must continue to contain the screen colour. So every time an object is to be drawn, this mask must be read for each bitplane of the object to be processed (this is also why the mask’s cache is needed, as illustrated above: to avoid having to read it each time, for each bitplane, as it does not change).

With packed/chunky graphics, however, the mask can be calculated on the fly, using the colour zero to signal a transparent pixel. Not only does this take up less memory space (only the colours of their pixels are present for the graphic objects), as the mask is missing, but the process is also more efficient, as only three channels have to be used for graphics processing, instead of the four normally required, with a 25% saving on the memory bandwidth used.

Another reason for making the introduction of the packed/chunky format practically compulsory is that in this way the Dual Playfield mode would have made it possible to use two 256-colour screens at a resolution of 320×200/256, continuing to read the data of the first screen from the odd-numbered bitplanes and that of the second screen from the even-numbered ones, but with the difference that the format of the data read would no longer be the planar one. The alternative would have been to duplicate the bitplanes, increasing them to as many as 16, thus duplicating the inefficiencies, as we have already seen in other articles: a road not to be taken at all.

Wanting to add a little more flexibility, one could also have implemented the first screen in packed/chunky (thus displaying 256 colours), while the second would have remained with planar graphics (displaying 16 colours, as there are four even bitplanes in total). This would have left more bandwidth available for the Blitter to update the graphics.

This format also greatly accelerates the speed of line drawing, reaching eight times the performance of an equivalent screen with eight bitplanes. In fact, the Blitter is designed to plot one pixel at a time, but… on a single bitplane, equivalent to just under one million pixels per second. So to draw one pixel on a 256-colour screen you have to draw the same line eight times, but on each bitplane, reducing to just over 110 thousand pixels per second. With packed/chunky graphics, on the other hand, it is sufficient to perform the operation only once to obtain the same result, with undeniable advantages not only in terms of performance (one million pixels per second), but also in terms of saving memory bandwidth (only 1/8 of it is used, saving 88%!).

No wonder if similar considerations apply everywhere, because if it is true that the Blitter was born to accelerate graphics operations, it is also true that it is only able to do so for certain predefined algorithms / primitives, so in all other cases the CPU will have to take care of it, for which handling planar graphics is monstrously more complicated and inefficient, wasting a lot of computing power and memory bandwidth.

Finally… 3D. For the same, identical, reasons, it is far faster to render a scene with a packed/chunky screen, precisely because the pixels to be drawn are processed in one go, whereas with planar graphics it would be necessary to set all the bitplanes each time, as we have seen. Although the Amiga can rely on the Blitter to draw lines to draw the triangles in scenes, and then fill them (this coprocessor also has this functionality), it can never compete with the CPU, which is able to manipulate individual pixels much faster, in addition to the fact that it is also able to apply textures or other special effects.

It is no coincidence that techniques for converting packed/chunky graphics to planar graphics arose over time, because it is much more convenient to draw the scene in this format, and only at the end to convert the result to planar graphics, so that it can be visualised by the Amiga’s video circuitry, as we saw when we talked about the Akiko chip in the CD32.

Conclusions

These are all considerations that were made on the basis of the experience gained developing for the Amiga, and particularly in the field of video games. I don’t think I’m wrong if I say that many professionals of the time felt the same way, since the needs were the same, as were many ideas about what would be needed.

This is why Commodore’s engineers should have actively involved a number of video game professionals, so that they would have a better understanding of what direction to go in, rather than getting lost in discussions and crazy ideas.

The next article will move a little further forward in time, ushering in the 32-bit era.

Press ESC to close