Computer Memory Addendum - February 2023

Outer Front Cover
Contents
Publisher's Letter: Ripping customers off through service & repair
Feature: Computer Memory, Part 2 by Dr David Maddison
Feature: Computer Memory Addendum by Nicholas Vinen
Project: Active Mains Soft Starter, Part 1 by John Clarke
Subscriptions
Project: Advanced Test Tweezers, Part 1 by Tim Blythman
Feature: A 30mm Spark-Gap Tesla Coil by Flavio Spedalieri
PartShop
Project: Active Subwoofer, Part 2 by Phil Prosser
Product Showcase
Feature: Heart Rate Sensor Module by Jim Rowe
Project: Noughts & Crosses, Part 2 by Dr Hugo Holden
Serviceman's Log: Nature abhors a vacuum, and so do I by Dave Thompson
Vintage Radio: VE301Wn Dyn Volksemfanger by Ian Batty
Market Centre
Advertising Index
Notes & Errata: Bass Block Subwoofer, January 2021; High-Performance Active Subwoofer, December 2022
Outer Back Cover

This is only a preview of the February 2023 issue of Silicon Chip.

You can view 36 of the 112 pages in the full issue, including the advertisments.

For full access, purchase the issue for $10.00 or subscribe for access to the latest issues.

Purchase a printed copy of this issue for $11.50.

MORE ON COMPUTER MEMORY The preceding article provides an overview of modern computer memory technology, but that technology is complex and would take a great deal of space to describe fully. We have compiled some interesting facts about the latest memory technology for those who want to know a bit more. BY NICHOLAS VINEN HE TOPICS COVERED T IN THIS ARTICLE include how data is stored in memory, more details on the differences between SRAM and DRAM, how DRAM timings vary, the relatively recent development of high-capacity on-CPU DRAM and some of the new features included in the latest DDR5 memory standard. Memory encoding schemes Last month’s first article on Computer Memory described how text could be stored (eg, as ASCII characters). Early computers had so little memory and such limited I/O that numbers and text were realistically the only things they could handle. But of course, these days, computers store and display so much more. Here are some other things that can reside in RAM. each byte can store two decimal digits, 0-9 and 0-9. This is somewhat wasteful as only 100 different values can be stored in a byte rather than 256, but it makes conversion for display easier and ensures correct rounding of dollars and cents etc. For decimal numbers, floating point is the most common storage method. It is similar to numbers in scientific notation, such as 6.02 × 1023 or 1.602 × 10-19. This allows the handling of tiny and huge numbers in the same amount of space. Floating point numbers are usually stored as 32 or 64 bits with one sign bit (positive or negative), an exponent (the power to which 10 is raised) and the mantissa (6.02 or 1.602 in the previous examples). For 32-bit floating point numbers (‘single precision’), the exponent is eight bits and the mantissa is 23 bits. For a 64-bit floating point number (‘double precision’), the exponent is 11 bits and the mantissa is 52 bits. 2. Still Images In the early days of computer graphics, images were typically stored as a grid of numbers. The most basic displays are monochrome and can only turn pixels on or off, so each pixel is allocated a bit and usually 0=off and 1=on. For greyscale images, each pixel is assigned a number, possibly a byte. In that case, 0=black and 255=white with 254 shades of grey in between. Colour images usually require between 16 bits (two bytes) and 32 bits (four bytes) per pixel. Those bits are typically split up into three numbers, one for red intensity, one for green and one for blue. Those three colours are A bitmap (“raster”) image next to a vector version of the same image. Vector images scale better than bitmaps. This is because bitmap images are created via filling individual pixels with a single colour, while vector images are composed of mathematical paths. JPG is an example of a bitmap image format, while SVG is Vector (300% scale) a common vector format. 1. Numbers Whole numbers (integers) are usually stored in binary, with one byte allowing a range of 0-255 or -128 to +127 to be stored. Two bytes (16 bits) can store an integer of 0-65535 or -32768 to +32767, while four bytes (32 bits) can store 0 to about four billion, or negative two billion to positive two billion. Financial systems sometimes use BCD (binary-coded decimal), where Bitmap (300% scale) 24 Australia's electronics magazine Silicon Chip Fixed-point decimal numbers are sometimes used where speed is more critical than precision or range. These are basically integers (whole numbers) with a fixed scaling factor, eg, 1/1000, in which case the integer 1234 represents the decimal 1.234. siliconchip.com.au mixed in varying proportions to create a range of colours. Images intended for printing might use four values: CMYK (cyan, magenta, yellow & black) rather than RGB (red, green & blue). High dynamic range (HDR) images might use even more bits, up to 16 per attribute or 48-64 bits per pixel. Usually (but not always), all the colour information is packed into an integer multiple of the byte size to make reading/writing pixels in the memory buffer easier. For 16-bit RGB colour images, such as those used on small TFTs, the 16 bits are usually allocated 5-6-5, with six for green and five for red and blue. That’s because the human eye can distinguish more shades of green than red or blue. However, the limited number of 16-bit colours often leads to ‘banding’ in gradients such as a blue sky, so 24-bit colour (8-8-8 or better) is preferred. While bitmaps are conceptually simple, the trouble is that they are large. A 4K (3480 × 2160 pixel) image in RGB with HDR (12 bits per attribute) would take 3840 × 2160 × 3(RGB) × 12(bits) = 296.6 million bits or 37.3MB if stored as a bitmap. So images are usually compressed for storage, eg, as PNG (lossless, preserving the original image perfectly) or JPEG (lossy) files. Still, in memory, images are usually kept as bitmaps for fast access. 3. Vector Images Vector images are generally stored as one or more shapes bounded by lines or splines. A spline is an elegant way to define a curve in 2D or 3D space using just a few numbers. For lines, it’s only necessary to know the x & y coordinates of each end of the line, while splines typically have two endpoints and two control points. The coordinates can be integers (whole numbers), floating-point or fixed-point numbers (decimals). Along with the bounding information, there will usually be colour/pattern information, transparency data etc. The characters used in fonts are defined this way, as well as many elements in files such as PDF (portable document format), PS/EPS (PostScript) etc. 4. Audio In memory, audio is usually stored as PCM (pulse-code modulation). This is simply a series of numbers representing the audio signal voltage siliconchip.com.au This image shows the motion vectors (as arrows) from a H.264 encoding of the film Big Buck Bunny (Blender Foundation, Peach Movie Project). Motion vectors are used to describe how one image can be transformed into another. These vectors are used to help compress movie formats, see https://w.wiki/62xT Source: https://trac.ffmpeg.org/wiki/Debug/MacroblocksAndMotionVectors sampled at regular intervals. The number of points per second is known as the sampling rate, while the number of bits allocated to each number is known as the bit depth. CD-quality audio has a 44.1kHz sampling rate and 16 bits per channel (two for stereo). 48kHz is another common sampling rate. Other rates you might see are onehalf, one-quarter, double or four times either value (44.1kHz or 48kHz). A bit depth of less than 16 generally means noisy audio, while lower sampling rates also lower audio quality. 24-bit samples are sometimes used for audio mastering but are not really necessary for consumer audio, even hifi. As with still images, audio files can take up a lot of memory, so they are usually compressed when stored, such as in the FLAC format (lossless) or MP3/AAC (lossy). 5. Video In the most basic sense, a video is just a series of still images (possibly accompanied by audio). Therefore, it can be encoded in the same way as still images but with more than one, which is the idea behind the (quite old) Motion JPEG encoding scheme. The thing is that most video frames are very similar to the last frame, so the amount of memory required is drastically reduced by storing the first frame, then the difference between each subsequent frame. Think of a video camera being panned or zoomed; in the case of panning, a frame will be mostly like the previous frame but shifted slightly. Australia's electronics magazine The distance and direction can be encoded in just a few bytes, compared to kilobytes or megabytes for a whole new frame image. In practice, a complete frame (‘I frame’) is occasionally stored, mainly to prevent image degradation over long periods and allow for seeking in the video. But most frames are stored only as differences, primarily in the form of ‘motion vectors’. Such encoding schemes include the MPEG series: MPEG, MPEG-2 and these days, MPEG-4, which encompasses a wide range of such algorithms. For example, digital TV and BluRays mostly use either MPEG-2 or, more recently, MPEG-4. The audio part of the video is encoded much the same as a regular audio file, usually in chunks between the video frames. Because video data can take up so much space, it is generally stored compressed in this way in both RAM and more permanent storage. A frame buffer is initialised with a bitmap of the first frame. Then, during playback, the motion vectors are applied to that buffer to produce a second buffer containing the next frame image. The process then repeats, alternating between buffers (sometimes more than two). 6. 3D Models 3D models are similar to the vector images described above, only with a third dimension. A three-dimensional ‘mesh’ of points, lines and/or splines describes the shape of an object to be shown on the screen, such as a person, vehicle, building etc. Flat image February 2023 25 in memory similarly to mathematical graphs, allowing the shortest or fastest route to be computed and directions to be generated. SRAM vs DRAM A 3D polygon mesh of a dolphin. Source: https://w.wiki/62xp ‘textures’ are mapped onto the faces of that shape and wrapped around. Lighting effects are applied to make the resulting rendered images look more realistic. Simulated bones can alter the shape of the mesh to produce realistic motion; hair and fur effects can be added on top, and so on, creating a three-dimensional moving image that, these days, can approach photo- realistic levels. Much computation is required to turn all that data into high-resolution images in real time, which is why modern graphics processor units (GPUs) are usually the computer’s most powerful (and power-hungry) part. It’s also why GPUs tend to have incredibly fast RAM, sometimes with a total bandwidth exceeding 1000GiB per second! 7. Maps Maps used for purposes such as navigation are effectively also vector data. Streets and intersections are joined and labelled, and ‘metadata’ is added, such as how many lanes are on a given road, which ones can turn, whether a street is one-way etc. They are stored SRAM memories are simple to use. To read a byte/word from an SRAM, the address data is first applied to the chip. Cascaded logic within the SRAM chip activates certain lines within, depending on this address, so only the memory cells at that address are enabled. When the chip’s read-enable line is activated, the data within those cells are fed to the data outputs. After a specific time (usually measured in nanoseconds), it has stabilised and is ready to be accessed by the processor. Writing to an SRAM memory is similar. The address lines are driven to select the address to be written, and at the same time, the data to be written is applied to the data input lines (shared with the data output). When the write-enable line is activated, the selected cells within the SRAM will change their state to match the states of the data inputs. Again, the cycle time is usually measured in nanoseconds. The processor can read and write addresses in any patterns it needs to, and the timings do not change. Reads and writes can proceed at the maximum frequency the chip supports (eg, 100MHz for a 10ns SRAM). Using a DRAM chip is far more complicated. Rather than having just a few timings to consider (like the SRAM’s address and data setup times), a DRAM has dozens of different timings. That’s because, to achieve a high density, the bits in the SRAM chip are arranged in rows and columns, and only one row in a bank can be active at a time. It takes some time to change active rows. To switch rows, first, the old row must be deactivated with a PRECHARGE command (and corresponding tRP delay). Then a new row must be activated with the ACTIVE command, incurring a further delay of tRCS. Then a column can be read or written after a further delay of CL. The tRP, tRCd and tCL delays usually are similar numbers of clock cycles (eg, around 14 cycles for DDR4). There is also typically a longer delay between activating a row and being able to deselect it. So constantly switching between rows to read values scattered throughout the memory is much slower than sequential or random reads within the same row. A few different approaches are used to overcome this. One is to have a highspeed SRAM cache within the processor that stores the most commonly accessed memory locations. That way, cache lines can be rapidly read or written to the main DRAM memory in bursts, taking advantage of the ability to read and write sequential addresses in the DRAM quickly. Also, by having multiple banks within each DIMM, while one bank cannot operate due to row switching delays, data going to/from another bank can pass over the memory interface. So with enough processor cores constantly reading and writing different banks, the interface is never idle. If that seems confusing, don’t worry, it gets a lot more complicated! Modern DRAM has timing parameters that include the following: CAS, RCD, RP, RAS, RC, FAW, RRDS, RRDL and CCDL. That isn’t even a complete list. These timings are stored in a small EEPROM on each DIMM for a range of clock speeds to allow the memory controller to be appropriately configured at boot time. Memory timing commands An example map taken from OpenStreetMap (www.openstreetmap.org/) showing a route (in blue) from Circular Quay to the Sydney Opera House. 26 Silicon Chip Australia's electronics magazine tCL CAS latency tRCD RAS to CAS delay tRP Row precharge time tRAS Row active time For more details, see: https://w. wiki/62vt & siliconchip.au/link/abi2 siliconchip.com.au Despite all this data being available, to achieve the best performance, it’s still necessary for the memory controller to spend some time ‘training’ the RAM (basically, experimenting with different timings until it finds an optimal combination that works). That is why a newly built computer can sometimes take quite some time (tens of seconds) to boot for the first time, or after a BIOS reset. One interesting aspect of DRAM performance to consider is due to the availability of multiple banks and the frequent delays in accessing data within a given bank. Consider a system with many CPU cores running in parallel, accessing DRAM over a shared bus. Some cores will be blocked at any given time, waiting on memory access. However, at the same time, other cores may be accessing data stored in different banks in the DRAM. They can therefore utilise the otherwise idle shared bus to transfer that memory. When those transfers complete, the other banks will likely be ready, and the bus will be handed over to the other cores. Therefore, having many CPU cores not only increases the total processing power available but also leads to better utilisation of the memory bus. This is why sometimes, splitting a task up among many cores can improve performance even when it is primarily limited by memory performance. On-package DRAM Fast on-chip SRAM caches have been around for a long time, at least as far back as 1989, when Intel launched A 2KiB SRAM (Static Random Access Memory) chip used in a NES clone. SRAM is significantly faster, but more costly than DRAM so it’s commonly used in small quantities such as in the L1 and L2 cache of a computer CPU (from a few KiB to a few Mib). Source: https://w.wiki/63EN siliconchip.com.au Table 1 – Apple M1 & M2 RAM configurations Model RAM capacity RAM chip Bus width Data rate M1 8GiB or 16GiB LPDDR4X-4266 128 bit 68.3GB/s M1 Pro 16GiB or 32GiB LPDDR5-6400 256 bit 204.8GB/s M1 Max 32GiB or 64GiB LPDDR5-6400 512 bit 409.6GB/s M1 Ultra 64GiB or 128GiB LPDDR5-6400 1024 bit 819.2GB/s M2 8GiB, 16GiB or 24GiB LPDDR5-6400 128 bit 100GB/s the 80486 processor with 8KiB or 16KiB of internal L1 cache. However, in November 2020, Apple launched their first range of full computers using processors that they designed themselves, dubbed the M1. These processors and their successors, the M2 series, are unique in today’s market because they do not use external DRAM for storage. Instead, they come with a fixed, fairly large amount of DRAM on a separate silicon die integrated into the CPU package – see Table 1. LPDDR is a variant of DDR (double data rate) DRAM, described in the preceding article, optimised for low power consumption. The main disadvantage of doing this is obvious: you cannot expand the RAM on these machines. Also, the chips are quite expensive to fabricate. However, the performance benefits are significant. While the M1 and M2 cores are individually not especially fast by today’s standards, because the onboard RAM has so much bandwidth and so little latency (the delay between making a request and the memory read/write being performed), they punch well above their weight in terms of performance, at least in certain tasks. Unsurprisingly, memory-intensive tasks benefit the most from this arrangement, eg, database manipulation. Mathematically-intensive tasks benefit too, but not to the same extent. DDR5 advancements The latest computer memory standard, DDR5, is an evolution of the now-mature DDR4 standard that has been around since 2014. Besides manufacturing process improvements allowing higher speeds at lower voltages, the main enhancements to DDR5 are the addition of local voltage regulation and the splitting of the 64-bit data channel into two 32-bit channels with double the maximum burst length. While DDR4 started at 2133MT/s (megatransfers per second), a typical DDR4 DIMM these days is rated at between 3200MT/s and 4000MT/s. DDR5 starts at 3200MT/s, with a typical DIMM being capable of 4800MT/s A Micro M4TC 128kB DRAM (Dynamic Random Access Memory) chip. DRAM typically uses a single capacitor and transistor to store one bit of data rather than multiple transistors for SRAM. DRAM is much cheaper due to a higher density of components per bit, but in turn uses more power than SRAM. Source: https://w.wiki/63EQ Australia's electronics magazine February 2023 27 and some well over 5000MT/s. For DDR4, switch-mode voltage regulator(s) on the motherboard produce the ~1.2V needed for the RAM chips to operate, fed to them via several edge-connector pins. Instead, DDR5 receives a higher voltage (either 5V or 12V) that is stepped down to the required voltage via an onboard regulator that’s usually in the middle of the DIMM. This has several advantages but primarily tighter voltage regulation, especially when there are transients. The baseline operating voltage for DDR5 is 1.1V with a typical maximum of 1.35V, compared to 1.2-1.6V for DDR4. As for splitting the data channel in two, the goal is to reduce latency when memory is being accessed in a ‘scatter- gather’ manner rather than sequentially. Importantly, DDR5 DRAM chips have 32 banks compared to the 16 banks of DDR4, meaning that less bank switching is required, so average throughput is improved. The maximum capacity of a DDR5 DIMM is 512GiB, meaning up to 2TiB of RAM in a four-slot system compared to 128GiB per DIMM for DDR4. In short, while DDR5 is a significant upgrade over DDR4 (as demonstrated by benchmarks and performance tests), that is due to several minor improvements rather than any revolutionary upgrades. Older DDR generations As mentioned earlier, DDR4 came out in 2014. Before that, DDR3 ruled the roost for almost a decade, since 2007. DDR4 was also an evolutionary upgrade from DDR3, again mainly due to process improvements. DDR3 modules typically operated at 1.5V compared to the 1.2V of DDR4, so they used quite a bit more power. Compared to the 2133-5000MT/s of DDR4, DDR3 had a much lower throughput at 800-2133MT/s (and rarely up to 3200MT/s). DDR3 DIMMs also topped out at around 16GiB compared to 128GiB for DDR4. DDR4 also SDR DDR QDR 2 signals per clock cycle Double Data Rate A diagram showing how the clock signal differs between SDR, DDR and QDR. Source: https://w.wiki/63sx 4 signals per clock cycle Quad Data Rate clock cycle Silicon Chip doubled the number of banks from 8 to 16. Going back further, it’s much the same story for DDR2 (released in 2003) compared to DD3. DDR2 operated at even higher voltages (starting at around 1.8V), so it was even more power-hungry and slower at 4001066MT/s. DDR2 also topped out at 8GB per DIMM, although this was very rare compared to the typical 2GB per DIMM. DDR2 brought a significant upgrade from the original DDR standard (released in 1998). With DDR2, the memory interface bus is clocked at twice the rate of the DRAM chips themselves, so four sets of data can be transferred per memory clock cycle compared to two for DDR1. DDR1 DIMMs also had fewer pins (184 vs 240). DDR2 also optionally doubled the number of banks from four to eight. DDR1 DIMMs operated at just 200400MT/s and had a maximum capacity of 1GiB per DIMM, limiting most desktop systems to a maximum of 1 signal per clock cycle Single Data Rate 28 Most DDR2-DDR5 memory (DIMM package) will look similar, with the exception of any fancy heatsinks. DDR1 memory in comparison only has 184 pins versus the 240 pins in DDR2-DDR5 memory. This type of memory is typically used in computers and is a form of synchronous DRAM, which have an external clock signal. The photo above shows a set of four DDR3 modules. clock cycle Australia's electronics magazine 4GiB. They ran at a whopping 2.5-2.6V, more than double what DDR5 needs! 2GiB DDR1 DIMMs might have been sold specifically for servers, but it likely would not register as the correct amount of memory in a typical desktop machine. Conclusion DDR DRAM will be used as the primary memory for computers for some time, until something better comes along; nobody knows when or what that will be. QDR (quad data rate) DRAM, which performs four transfers per clock cycle, was briefly tried by Intel in the mid-2000s but never really took off. GDDRX5 video memory chips from 2016 also had an optional QDR mode. DDR performs one transfer on the negative clock edge and one on the positive, while QDR does the same but also performs transfers during the positive and negative plateaus. However, it seems that the added complexity isn’t worthwhile, given that this does nothing to reduce access latency. These days, the best performance seems to come from a combination of highly parallel DRAM, which provides exceptionally high throughputs, with relatively large and very fast local SRAM caches such as AMD’s “Infinity Cache” on its RDNA2 (128MiB cache) and RDNA3 (96MiB to 384MiB cache) graphics processors (GPUs). SC siliconchip.com.au