||Although the K7 is a completely new 7th
generation processor, much of the technology incorporated in the design has fairly deep
roots. AMD has simply put all of these different technologies together and streamlined
their interactivity to produce the K7. Generally, increasing things like the depth of the
execution units and number of registers is enough in itself - if you can pull it off - to
increase performance but that is only partially the case for AMD's new baby.
The K7 sports a smaller 2048-entry Branch Prediction Table than the K6 family. Although I
have currently been unable to ascertain why this is so, I expect that the larger 12 entry
return makes for a faster turnover for incorrect predictions. A Branch Prediction Table is
sort of a history table which stores an entry for each conditional branch executed by the
CPU while running the current application. The K7 compares the data it receives against
this table and makes it's best guess as to which branch to direct it to.
The K6, with its short 6 stage pipeline, had few problems with incorrect branch
predictions as they cost only 4 clock cycles. A branch prediction miss on the more deeply
pipelined (10 stage) K7 will cost more than the 4 clock cycles, that is why is seems
unusual for AMD to implement such a small branch prediction table. Prediction rates of 90%
to 95% are critical to make sure that a so deeply pipelined, superscalar CPU does not
waste clock cycles! I will be updating this preview to review status once I get a hold of
a K7 and perhaps will know more then.
Universal x86 Decoders
Decoders translate the variable length complex x86 instructions into small, fixed length
RISC-like operations. While both the PII/III and K7 each have three decode units, all
three on the K7 are full universal arbitrary x86 decoders. The PII/III is limited here in
that only one of the three decode units is a full arbitrary decoder. The other two can
only perform simple x86 instructions decode operations. This means the K7 should sustain a
higher, more fluid decode rate.
x86 instructions are handled in two ways. Simple instructions of 1-15 bytes in length,
which are the most common, follow what AMD calls DirectPath, which is streamlined for fast
execution. For the few complex instructions, the k7's VectorPath is used. The x86
instructions are converted into simpler MacroOPs Decoding Pipelines can dispatch as many
as 3 MacroOps to Execution Unit Schedulers at a time. Each MacroOp consists of one or two
These Ops are then issued to the execution units
Integer Execution Units
Along with it's 3 FPUs, which we covered in Part One, the K7 provides three integer
execution units and address generation units for a total of nine execution units
supporting the flow of these decoded MacroOps through the processor generating up to 2.5
instructions executed per cycle - outperforming the rate of 2-2.1 executes by the PIII. To
further aid this process, the K7 uses a 15-slot instruction scheduler. This is needed for
out of order execution. When an execution unit becomes available, it can be fed with an
out of order instruction which eliminates wait states while the preceding instructions
finish executing - if, that is, there are no dependencies between the instructions. The
K7's integer units also are capable of speculative execution. As with Branch Prediction,
the integer unit makes it's "best guess" as to the execution order. Speculative
execution can be an invaluable time saver provided the integer unit's guess is correct and
since it guesses correctly better than 90% of the time, on average, aids data execution
0.25 micron Process Fabrication and Die Size
The first K7 chips will still be produced by AMD's Austin, Texas-based fab 25, which AMD
promises will shift over to 0.18 micron process in the second half of '99, but AMD's new
Dresden fab 30 in Germany is set to produce 0.18 micron K7's.
|AMD showed off a 600MHz 0.25 micron CPU at
CeBit, demonstrating that their 0.25 process could handle the higher frequencies. But,
don't expect AMD to produce 0.25 micron parts for long. Moving the K7 from the 0.25 micron
process die at a size of 184mm˛ to the 0.18 micron process will reduce the die size to a
much smaller 104mm˛. AMD also hopes to move from aluminum interconnects to faster cooler
copper technology later this year which should provide stability for speeds of 1GHz and
maybe even higher. Yep, that's right folks, 1 GIGAHERTZ!
||At 184mm˛, the 0.25 micron K7 die is bigger
than the PIII, and die size can hold a direct relationship with the pure speed of the CPU.
Larger die chips require more power to run efficiently and power increases tend to
increase heat which tends to decrease performance.
die size presents a few logistical problems as well. Increased die size means fewer chips
per wafer and lower yields that will no doubt result in initially higher cost to the
consumer. Once the Dresden fab ramps up, the prices should drop as production increases.
This should hopefully put the K7 into the mainstream buyers market before Christmas.