Now, (for Cmmnsense here) some stuff from IBM (which will have #'s that sound familiar)...
I'll start off with something I noted when going through the IBM stuff. The show a 4Ghz Cell (8SPE) as producing 26TFLOPS. Now this caught my attention as it is certainly alot different from the listed 218TFLOPS from Sony @ 3.2. After a bit of digging I found the difference... The Cell can achieve those numbers using a Single Percision Floating Point Operation (256TFLOPS @ 4Ghz). This is a fast but inaccurate way to handle things and introduces error rates into calculations while the IBM listing was for Double Percision (which is pretty much standard on typical PC's). Unfortunately, I don't know what the percentage of error rate is so I can't say how much negative impact that places on the processor.
The next issue I want to clear up is the SPE confusion. While the Cell has 7 SPE's (For PS3) the PPE does have only two threads and the SPE's have to go through the PPE to access the rest of the system. This isn't the same as having 7 cores.
As an example here, let's assume that you have instructions to run over 50 cycles, the Cell can run 2 SPE's on the first cycle, 4 on the second, 6 on the third and 7 as of the 4th and continue to run 7 for 42 more cycles (not 46 because it needs 4 to output). This means that the Cell does approx 325 instructions during that time. Meanwhile, the 360 can manage 6 threads per turn without a ramp up, but would get 300 threads. This shows that the cell 'can' outperform assuming that the program is properly timed to take advantage of these features, but since STI hasn't released the software programing tools yet, it's difficult to say how well this can be done.
The 256k on each SPE is a cache, but it's not even close to the same as having this as an L2 cache. When a SPE needs to store information, it needs to DMA through the PPE to recieve information that it can then store in it's 256k, before it can use it. This means that if 2 cores DMA for information, they take up the 2 threads on the PPE for that cycle. So while it is helpful, because of the design it doesn't give the same performance as having 2MB of cache (especially since the cell's don't share it)
Another interesting note is the XDR Ram (the Cell also uses an XDR controller. XDR architecture is capable of supporting 4 units of up to a maximum of 512Mb (note that this translates to 64MB). This means that the XDR can support up to 256MB of memory, no more. Also, this indicates that they are using 16bit data buses within the XDR (Note that I got this from (2[twin DRAM units] x 16 [bits] x 3.2Ghz [Ram speed]=102.4Gbps) which translates to 12.6GB/s. Since it uses 2 channels for data transfer to the CPU, that gives it 25.2GB/s bandwidth to the CPU (which is what they claimed). Meaning that they are using an one cell of first generation (IBM is talking about increasing the # of devices supported, but this adds alot of cost so don't expect it in the PS3). This also conforms to why the used 256 XDR and 256 GDR3.
Hopefully this clears up a few things.
I'll wait for Cmmnsense or someone else now before we continue crunching numbers.
My killer sig came courtesy of bb "El Jefe" mayo.
The Forum Rules You Agreed To!http://forums.afterdawn.com/thread_view.cfm/2487 "And there we saw the giants, and we were in our own sight as grasshoppers, and so we were in their sight" - Numbers 13:33
This thread is closed and therefore you are not allowed reply to this thread.