Glen Turner (vk5tu) wrote,
Glen Turner
vk5tu

Raspberry Pi versus Cray X-MP supercomputer

It's often said that today we have in the phone in our pocket computers more powerful than the supercomputers of the past. Let's see if that is true.

The Raspberry Pi contains a Broadcom BCM2835 system on chip. The CPU within that system is a single ARM6 clocked at 700MHz. Located under the system on chip is 512MB of RAM -- this arrangement is called "package-on-package". As well as the RPi the BCM2835 SoC was also used in some phones, these days they are the cheapest of smartphones.

The Whetstone benchmark was widely used in the 1980s to measure the performance of supercomputers. It gives a result in millions of floating point operations per second. Running Whetstone on the Raspberry Pi gives 380 MFLOPS. See Appendix 1 for the details.

Let's see what supercomputer comes closest to 380 MFLOPS. That would be the Cray X-MP/EA/164 supercomputer from 1988. That is a classic supercomputer: the X-MP was a 1982 revision of the 1975 Cray 1. So good was the revision work by Steve Chen that it's performance rivalled the company's own later Cray 2. The Cray X-MP was the workhorse supercomputer for most of the 1980s, the EA series was the last version of the X-MP and its major feature was to allow a selectable word size -- either 24-bit or 32-bit -- which allowed the X-MP to run programs designed for the Cray 1 (24-bit), Cray X-MP (24-bit) or Cray Y-MP (32-bit).

Let's do some comparisons of the shipped hardware.

 

Basic specifications. Raspberry Pi versus Cray X-MP/EA/164
Item Cray X-MP/EA/164 Raspberry Pi Model B+
Price US$8m (1988) A$38
Price, adjusted for inflation US$16m A$38
Number of CPUs 1 1
Word size 24 or 32 32
RAM 64MW = 256MB 512MB
Cooling Air cooled, heatsinks, fans Air cooled, no heatsink, no fan

 

Neither unit comes with a power supply. The Cray does come with housing, famously including a free bench seat. The RPi requires third-party housing, typically for A$10; bench seats are not available as an option.

The Cray had the option of solid-state storage. A Secure Digital card is needed to contain the RPi's boot image and, usually, its operating system and data.

 

I/O systems. Raspberry Pi versus Cray X-MP/EA/164
Item Cray Raspberry Pi
SSD size 512MB Third party, minimum of 4096MB
Price US$6m (1988) A$20
Price, adjusted for inflation US$12m A$20

 

Of course the Cray X-MP also had rotating disk. Each disk unit could contain 1.2GB and had a peak transfer rate of 10MBps. This was achieved by using a large number of platters to compensate for the low density of the recorded data, giving the typical "top loading washing machine" look of disks of that era. The disk was attached to a I/O channel. The channel could connect many disks, collectively called a "string" of disks. The Cray X-MP had two to four I/O channels, each capable of 13MBps.

In comparison the Raspberry Pi's SDHC connector attaches one SDHC card at a speed of 25MBps. The performance of the SD cards themselves varies hugely, ranging from 2MBps to 30MBps.

Analysis

What is clear from the number is that the floating point performance of the Cray X-MP/EA has fared better with the passage of time than the other aspects of the system. That's because floating point performance was the major design goal of that era of supercomputers. Ignoring the floating point performance, the Raspberry Pi handily beats out every computer in the Cray X-MP range.

Would Cray have been surprised by these results? I doubt it. Seymour Cray left CDC when they decided to build a larger supercomputer. He viewed this as showing CDC as not "getting it": larger computers have longer wires, more electronics to drive the wires, more heat from the electronics, more design issues such as crosstalk and more latency. Cray's main design insight was that computers needed to be as small as possible. There's not much smaller you can make a computer than a system-on-chip.

So why aren't today's supercomputers systems-on-chip? The answer has two parts. Firstly, the chip would be too small to remove the heat from. This is why "chip packaging" has moved to near the centre of chip design. Secondly, chip design, verification, and tooling (called "tape out") is astonishingly expensive for advanced chips. It's simply not affordable. You can afford a small variation on a proven design, but that is about the extent of the financial risk which designers care to take. A failed tape out was one of the causes of the downfall of the network processor design of Procket Networks.

Appendix 1. Whetstone benchmark

The whets.c benchmark was downloaded from Roy Longbottom's PC Benchmark Collection.

Compiling this for the RPi is simple enough. Since benchmark geeks care about the details, here they are.

$ diff -d -U 0 whets.c.orig whets.c
@@ -886 +886 @@
-#ifdef UNIX
+#ifdef linux
$ gcc --version | head -1
gcc (Debian 4.6.3-14+rpi1) 4.6.3

$ gcc -O3 -lm -s -o whets whets.c

Here's the run. This is using a Raspbian updated to 2014-08-23 on a Raspberry Pi Model B+ with the "turbo" overclocking to 1000MHz (this runs the RPi between 700MHz and 1000MHz depending upon demand and the SoC temperature). The Model B+ has 512MB of RAM. The machine was in multiuser text mode. There was no swap used before and after the run.

$ uname -a
Linux raspberry 3.12.22+ #691 PREEMPT Wed Jun 18 18:29:58 BST 2014 armv6l GNU/Linux

$ cat /etc/debian_version 
7.6

$ ./whets
##########################################
Single Precision C/C++ Whetstone Benchmark

Calibrate
       0.04 Seconds          1   Passes (x 100)
       0.19 Seconds          5   Passes (x 100)
       0.74 Seconds         25   Passes (x 100)
       3.25 Seconds        125   Passes (x 100)

Use 3849  passes (x 100)

          Single Precision C/C++ Whetstone Benchmark

Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12475013732910156       138.651              0.533
N2 floating point     -1.12274742126464844       143.298              3.610
N3 if then else        1.00000000000000000                 971.638    0.410
N4 fixed point        12.00000000000000000                   0.000    0.000
N5 sin,cos etc.        0.49911010265350342                   7.876   40.660
N6 floating point      0.99999982118606567       122.487             16.950
N7 assignments         3.00000000000000000                 592.747    1.200
N8 exp,sqrt etc.       0.75110864639282227                   3.869   37.010

MWIPS                                            383.470            100.373

It is worthwhile making the point that this took maybe ten minutes. Cray Research had multiple staff working on making benchmark numbers such as Whetstone as high as possible.

Tags: raspberrypi
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 3 comments