Memory - Beyond The Basics!

Current memory technology is just about to run out of steam.   System bus speeds will likely shoot past 200MHz within the next two years if Intel has its way, and this opens the door for new technology to replace the current memory design. DRAM is here to stay ( at least for the foreseeable future ), but the way the CPU and memory controller access it is about to change. Fortunately the Super7 mainboards which are supported by the MVP3 chipset can take advantage of some of these emerging technologies. DDR SDRAM ( Double Data Rate SDRAM, a type of SDRAM that supports data transfers on both edges of each clock cycle, effectively doubling the memory chip's data throughput ), Virtual Channel SDRAM, and Enhanced SDRAM can all find support with the VIA chipset. And while Intel has licensed, and is pushing Rambus ( RDRAM ), the ultimate goal is still improving system performance. As these new memory types become available, users will need to make their own choices. In order to avoid making costly mistakes, it is important to understand the real performance balance between DRAM, processors and buses. One of the most difficult things about writing this article is boiling down pages and pages of technical information and rendering it into something that is both easy to understand but not oversimplified to the point where it just becomes so much drivel.



Before we go any farther let. s make sure that we're all on the same page about how DRAM accesses are generated.

The processor core reads code and data from the caches at an astoundingly high rate of speed unless the core can't find a needed or expected bit of code or data resulting in a cache miss. When this occurs the processor stops until the need for the missing code or data is satisfied. To achieve this the processor generates a 64-bit external read called the "demand word access". Only when the "demand word access" is fulfilled and the "demand word" ( the missing code or data ) is retrieved is the processor able to continue processing. The period that the processor must wait for the "demand word" is known as latency.

DRAM latency is measured in nanoseconds, and can dynamically vary from under 40ns to over 100ns depending on a variety of different factors. Latency can also be measured in terms of external CPU bus clocks, but in order to understand the real effect of latency on the processor's performance, it has to be evaluated in terms of core CPU clocks.

The memory controller accesses DRAM in 4-bit burst lengths in either sequential or interleaved ( scrambled ) fashion. To achieve top system performance, the memory must feed your processor data at the same speed as the system clock. Data accessed from L2 cache comes close to meeting this requirement, even though there is a wait state at the first bit, yielding a burst rate of 2/1/1/1. Only the first bit incurs a wait state; the next three feed at the bus speed.
Data accessed directly from DRAM has a degraded burst rate. The best you can get occurs in a page hit--that is, if the DRAM page is already open. In that case, the fastest burst rate you can achieve on a 66MHz bus is 5,2,2,2 with EDO DRAM and 5,3,3,3 with FPM DRAM. Even though EDO DRAM improves page cycle times by 40 percent, overall, the benefit to system performance is minimal with a reduction of only three wait states. And sadly, there is no improvement in accessing the most crucial first bit.
To help overcome these wait states, DRAM designers employ a bursting technique that allows sequential reading of an entire page of DRAM. Once the page address has been accessed, the DRAM itself provides the address of the next memory location. This address prediction eliminates the delay associated with detecting and latching an address externally provided to the DRAM.

Implementing this burst feature requires the definition of burst length and burst type to the DRAM. The burst length is the actual length of the burst plus the starting address. This lets the internal address counter properly generate the next memory location. The burst type determines whether the address counter will provide sequential ascending page addresses or interleaved page addresses within the burst length.

DRAM designs evolved to reap the most benefit from this new bursting technique, in the form of two new DRAM types: SDRAM and burst EDO ( BEDO ) DRAM. SDRAM is capable of burst rates of up to 100MHz and is pin-compatible with existing industry-standard chip designs. Once a burst has begun, SDRAM delivers all bits at a 10ns rate. However, random-access times are no better than with FPM or EDO DRAM--two steps up and one step back. BEDO DRAM, designed specifically for the PC market, supports the same 4-bit burst length in sequential or interleaved fashion, but tops out at 66MHz.

For the current crop of PCs with 66MHz buses, SDRAM and BEDO DRAM produce a burst rate of 5/1/1/1. The benefit of SDRAM is its ability to operate at bus speeds of up to 100MHz, since the first data access of both SDRAM and BEDO DRAM still requires five wait states. Nevertheless, SDRAM experiences greater industry support, most likely because of the comparable cost in relation to BEDO's and SDRAM's future capability. You'll find support for SDRAM in motherboards equipped with the new Intel VX chip set and all new VIA chip sets. BEDO DRAM requires these chip sets.

But all this becomes redundant when bus speeds exceed 100MHz, and that is something Intel has been hinting at for some time. Two things can improve DRAM performance - improved latency ( access time ) or increased peak burst bandwidth. As you'll see some of the new DRAM types improve latency, while others crank up the burst rate.
What. s one is better? In general, for today. s desktop PC, faster latency seems to deliver the most performance benefit. Increasing peak burst bandwidth can offer a performance benefit, but not in every case, and not usually as much. This is due to the fact that current processors are only capable of using burst data at a rate of one CPU bus clock. So far SDRAM already satisfies this requirement. While increased latency DRAM allows the CPU to resume operation quicker when it has to access DRAM to satisfy a cache miss.
So why increase bandwidth at all? Sometimes the processor has a conflict with master mode peripherals over DRAM. This means that a peripheral can be accessing DRAM at the same time the CPU stalls on a cache miss, Higher bandwidth DRAM can resolve the conflict a little faster. But fast latency DRAM can achieve the same result or better depending on burst length.



New Memory Designs
Two designs address memory speed and bandwidth needs: SyncLink ( SLDRAM ) and Rambus ( RDRAM ). SyncLink is a consortium of nine DRAM manufacturers, including Hyundai, IBM, Micron, NEC, and TI, which have proposed a draft standard to the IEEE for a uniform memory architecture that will evolve over several iterations. The SyncLink standard will be royalty-free and open to all. The proposal calls for a command-driven, packet-oriented bus operating at 400MHz, with a 16-bit-wide data path.

SyncLink is slow in coming and has not progressed much past the conceptual stage. Rambus, on the other hand, has a working architecture. Current RDRAM designs hit the 500MHz to 600MHz range to the memory controller. Additionally, by using multiple channels or widening Rambus' current 8-bit channel size, Intel hopes to increase throughput to 1.6GB/sec. With that, Rambus will meet Intel's design needs for faster processors and a higher speed bus, as well as for AGP.

Larger Bandwidth
The Rambus design centers on a high-speed interface that transfers data over an 8-bit bus called the Rambus Channel. The RDRAM system uses a two-channel configuration with one 2MBx8 RDRAM per channel, for memory granularity of 4MB. Each pair of RDRAMs has an effective 16-bit data path and presents two 4K open pages to the memory controller. An SDRAM system, in contrast, needs four 1MBx16 SDRAM chips sharing a common address/control bus with a memory granularity of 8MB. This 64-bit data path presents two 2K pages to the memory controller. Since all DRAM designs use the same core technology, the fundamental device timings are identical, basically 5/1/1/1. However, the speed at which these devices move address and control information to the memory controller is the differentiation. The Rambus system, oscillating at 533MHz internally, can transmit data every 3.75ns, which is four times faster than the system bus's 66MHz clock rate of 15ns. This translates into an RDRAM-based system needing eight CPU clock cycles to move 32 bytes of data versus 10 CPU clock cycles for an SDRAM-based system.

Using RDRAM, the memory controller latches a valid memory address and read request from the CPU at clock cycle zero. Because of its standard DRAM core technology, the RDRAM device latency from the start of the command to the first byte of data returning from the RDRAM is five CPU clock cycles. Within the next CPU clock cycle, however, the Rambus clock will cycle four times, and the two RDRAMs will transfer 16 bytes of data, 4 bytes per Rambus cycle, to the memory controller. With the seventh CPU clock cycle, the two RDRAMs transfer another 16 bytes of data. By the eighth CPU clock cycle, the RDRAM-based system will move the entire 32 bytes of data to the memory controller.

Faster Latency
There are several types of special fast latency DRAM just clearing the horizon. The leading the movement is ESDRAM (Enhanced SDRAM) from Enhanced Memory Systems of Colorado Springs. ESDRAM has been approved by JEDEC (Joint Electron Device Engineering Committee) as a superset of the SDRAM standard. Compatible with standard SDRAM it can be used in existing systems with plug compatible DIMM and SO DIMM modules.

Capable of operating at bus speeds up to 133MHz, it offers better latency than ordinary DRAM at all speeds. When its special features are properly supported in the chip set, it can improve latency by a as much as 35-50% depending on the bus speed. It should be mentioned that this performance increase may not be detected in today's benchmarking standards, the true results will depend on how much an application actually uses main memory.
While it is more expensive to produce, it. s performance potential should show improved performance for many users and applications, especially when overclocking.
Using a 5x clock multiplier, SDRAM at 133MHz could satisfy processor speeds as high as 667MHz. It may be possible to migrate to higher bus speeds using SDRAM, but at present the technology is not in place requiring us to evaluate new DRAM solutions that can combine both faster latency and higher burst speeds.


Conclusion
Ultra high burst bandwidth could find its greatest potential in multiprocessor servers that more fully utilize the DRAM bus. Although for the most part this need is currently satisfied using wide configurations of standard DRAM. If DDR, SLDRAM or Rambus become available at no cost premium, these memory types could also be used in the server market. That may be a way off so for now, fast latency SDRAM running at bus speeds between 66 and 133MHz should provide satisfactory performance for most types of systems.

Up  ]Memory Basics  ][ Beyond The Basics ]DDR SDRAM  ]PC100 Explained  ]Virtual Channel  ]