You are here

supercomputing

Predicting the next Top500 list: the outcome

A couple of months ago I discussed my bets for the Top500 predict contest. I still owe you an update on the outcome...

After placing my original bets, I kept a close eye on their value. The contest allowed you to sell bets, for a part of the possible return value. I quickly realized this was an important aspect as this was a way to stack up credits in my virtual wallet with 100% guarantee, as opposed to hoping that the bets placed were correct to win a (possibly only slightly) larger return.

The value of bets was determined by the degree in which other betters agreed on your bets. The more people strongly agreed with one of your bets, the closer the value of that bet got to the possible return value. When a lot of people agreed with your bet, it was often clear that selling the bet was the best way to go. I opted for selling bets with a value of at least 65% of the possible return value.

After selling a bet, it was often interesting to place another slightly different bet on the same aspect of the Top500 list. Although the initial value of that bet would be fairly low, it was possible to either sell that bet again in a couple of days, or just leave the bet there, hope you got it right and cash in at the end. By just betting a small part of what you earned by selling the original bet, you could quickly stack up lots of credits and still hope for a large total return at the end.

Here's my entire trade history, and my portfolio at the end of the competition:

When I joined the contest, I spent my entire budget of 12,000 credits on bets. By selling high valued bets, I was able to collect about 41,500 credits throughout the contest, and placed other bets for about 3,300 credits. Out of the 8 bets that were still open at the end of the contest, 4 of them were correct, which yielded roughly another 17,500 credits (only about 4,600 credits were lost to incorrect bets). The resulting total of 55,700 credits, a 4.6x increase of my original budget, was enough to keep the first place in the betters ranking which resulted in a nice shiny iPad on my doorstep a couple of weeks later (thanks deplhit!).

I can't really say I've put that iPad to good use since then (unless you consider playing games during daily commutes or letting a drooling 1.5-year old smash his fists on it useful), but it's a nice toy to have nevertheless. More importantly, competing in the Top500 predict made me dive into a little bit of supercomputer history, which was quite interesting.

Predicting the next Top500 list

(yes, it's been a while)

Two weeks ago, one of my colleagues mentioned the Top500 predict website to me, a game in which you need to predict the next Top500 list of supercomputers, which is due in November 2011.

In the game, you receive a number of credits (12K in total, if you provide some personal details) that you can spend in bets. Each bet concerns some feature of the Top500 list, and betting is done by predicting that particular feature and betting a number of credits on that prediction. For most features, you need to specify an interval instead of just a fixed value.

The interesting aspect is that the narrower you specify the interval the more your bet may return, but of course also the bigger the chance is that the actual value drops outside the interval you specified. Competing in the game is absolutely free; although it concerns betting, no real money is involved at all (e.g. you can't buy additional credits to bet with). The winner of the game does win an iPad2 though...

I decided to take this game seriously instead of just entering bets straight away. The same evening I had some free time on my hands, and I studied the Top500 lists of the past couple of years in terms of the features to predict. After a couple of hours, I was confident that I could place well-informed bets that had a good chance of being correct. In this post I'll briefly discuss each of those bets, describing the reasoning behind the respective bets I placed. Feel free to disagree with me.

Predicting the future top supercomputer

Let's look at the bets involving the future top supercomputer first, which involve three features: overall performance obtained for the LINPACK benchmark (RMax), country hosting the top system and power consumption of that system.

In the last two editions of the Top500 list, Japan (Nov. 2010) and China (June 2011) hosted the top supercomputer. This heavily contrasts with prior lists, in which the USA was firmly on top for 12 editions in a row (since Nov. 2004). Both Japan and China came up with significantly better performing systems compared to the former champion. The Japanese Tianhe-1A system, which heavily relies on GPUs to boost performance, currently has a peak performance of 2.57 petaflops, while the most recent Japanese champion, the K computer, has an Rmax of 8.16 petaflops. Both offer significantly more performance than the 1.76 petaflops of the Jaguar (USA) which topped the June 2010 list.

Especially the huge performance improvement of the K computer in the most recent edition of the Top500 list makes me think that no system will be able to beat the current champion in the next edition. The USA has two large systems in the pipeline, Mira (~10 petaflops) and Sequoia (~20 petaflops), but these won't be ready until early 2012 according to latest news. The cancellation of the Blue Waters project by IBM only makes me more confident. I'm unaware of other systems that might compete with the K computer (but I may be missing some).

Thus, predicting these three features is fairly easy under that assumption: pick Japan as the country which will host the next top supercomputer, and choose the intervals for both performance in terms of Rmax and power consumption fairly narrow to just contain the current values. If the K computer does indeed stay on top, this results in a huge return: ~23.7K credits for 4.5K of credits spent. If I'm wrong however, I lose over 1/3 of my total budget, so this group of bets is kind of a game changer for me.

  • RMax of Top 1 Machine

    my bet: 8 - 9 PFLOPS (2K, return: 12.6K)

  • Country with the Top 1 Supercomputer

    my bet: Japan (1.5K, return: 6.5K)

  • Power Consumption of the Top 1 Machine

    my bet: 9 - 12MWatt (1K, return: 4.6K)

Predicting national performance development

Another group of features-to-predict involve overall performance development for three selected countries: Germany, China and USA. This involves the ratio of the total Rmax (sum over all systems of that country) over that of the previous list. A bit of a dubious measure, as adding or removing one single system might may a big difference, but hey...

A little analysis of the performance gain of consecutive editions of the Top500 list showed that this is a particulary hard feature to predict. Over the years, the ratios vary wildly, and no real trend can be found. China has shown remarkable progress by improving the total sum of Rmax values by more than 75% three times in a row (Nov. 2009 - Nov. 2010). In general however the performance ratios are usually somewhere around 1.2, i.e. about 20% more total Rmax compared to a the previous Top500 list.

Making accurate predictions for these features is very difficult (if not impossible), so I chose to enter fairly large intervals. Naturally this results in lower returns, but I hope I can avoid to lose the credits I bet this way. If the actual ratios are in the specified intervals for each country respectively, this group of bets can get me a total return of 11.7K credits.

  • Performance Development Germany

    my bet: 15 - 35% (1K, return: 4K)

  • Performance Development China

    my bet: 25 - 50% (1K, return: 3.2K)

  • Performance Development USA

    my bet: 15 - 30% (1K, return: 4.5K)

Amount of systems using GPUs

To estimate how many systems will rely on GPUs, I didn't spend much time studying the evolution in previous editions of the Top500 list.
Although GPUs have been around for quite some time, only recently have they been actively used in supercomputers for general-purpose computing instead of high-quality graphics. The June 2011 edition of the list featured only 14 systems (partially) powered by GPUs, as far as I can tell from the raw data available from the Top500 website.

In my view, GPU-powered systems won't take over the Top500 list just yet. GPUs are still an add-on to more traditional compute powered delivered by CPUs, are not easy to use because they require specialized knowledge (e.g. involving data locality and languages like CUDA, OpenCL, ...), and are only applicable to certain types of applications.

Therefore, I picked an interval on the lower end of the spectrum, with a maximum of 40 systems using GPUs. I highly doubt the actual value will go over 25, but I opted for a safe bet on this one.

  • my bet: 10 - 40 (1K, return: 9.3K)

Replacement Rate

Predicting the replacement rate involves how many systems will be replaced by new ones, and thus how many systems will drop out of the June 2011 edition of the list. Again, historical data showed that this is fairly hard to predict.

On average about 45% of systems got replaced over the past couple of years, but there's a significant amount of variation between editions of the list (from 29% up to 60%). Therefore, again, a fairly safe bet by specifying a wide interval, and a fairly low return resulting from that.

  • my bet: 25 - 50% (1K, return: 3.3K)

Total Sum of RMax Value

One feature that does show a strong trend is the total sum of Rmax over the entire list. For the June 2011 edition a total of 58.93 petaflops was reached, and on average an increase between edition of about 35% is observed.

I slightly adjusted my prediction to less though, because the highest performing system of the last edition was responsible for 14% of the total Rmax. And since I don't expect the top supercomputer to change in November, I anticipate the increase in total Rmax will be slightly lower this time.

  • my bet: 64 - 70 PFLOPS (1.5K, return: >5.1K)

Entry Level

The last feature to predict is the entry level Rmax, in other words the minimal LINPACK performance a system needs to deliver to be mentioned in the Top500 list.
This is also a feature which shows a fairly strong trend over time: on average, an increase of about 34% is observed.

The previous edition had an entry level Rmax of 40.19 teraflops, and because of a slightly lower increase in entry level Rmax in the most recent editions of the list, I also adjusted my prediction a bit downward. The interval is fairly small, which is a risk but one with a potentially large return.

  • my bet: 45 - 50 TFLOPS (1K, return: 9.3K)

Conclusions

Whether my predictions turn out to be right or not, figuring out how to make decent predictions for the next Top500 list was fun. The competitive aspect makes it more interesting, and overall it's a cool experiment to try and see how predictable the list really is.

The current rankings (with me on top at the time of writing) are not really indicative for how the final rankings will look like. The net worth used to rank competitors only evaluates how good your predictions match those of others. However, it's not because a large part of the users predicts a feature in a particular way that the actual value will match those predictions. The creators of the competition hope that "the wisdom of the masses" will make the predictions match the actual values closely. Let's see how that turns out...

To exascale, or not to exascale?

A couple of days ago a Slashdot post entitled "Supercomputer Advancement Slows?" caught my attention. It concerns an IEEE Spectrum article on Next-Generation Supercomputers, which is well worth the read imho.

In short, the article mentions various reasons why supercomputers won't break the exaflop (1,000,000,000,000,000,000 operations per second) limit anytime soon. The major concerns are well-known, i.e. power usage, cooling, cost, physical footprint, etc. Besides this, also the degree of use (actual vs peak flops), the huge memory/storage requirements concerns and the need for fault-tolerance were touched upon in the article.

First of all, I'm not sure whether I fully agree with the conclusion of the article. Although some of the problems mentioned seem like they're too hard to handle right now, we've seen some amazing things being accomplished the past couple of decades. Also, I kind of had a "the earth is flat"-feeling when reading the article, if you know what I mean. I might be wrong, though. When YouTube started to gain momentum a couple of years ago I felt like it would never work out because people wouldn't want to put videos of themselves online for everyone to see. Boy, was I wrong...

Nevertheless, the reason I'm bringing up this Slashdot post is because I feel the author(s) of the IEEE Spectrum article missed something.

During my PhD I "wasted" a couple of centuries of computing time on the university HPC infrastructure myself. Also, in the last couple of months since I've become part of the HPC-team at Ghent University, I've worked with scientists from various fields who run experiments on our (currently rather modest) HPC infrastructure. This experience with HPC systems from the point of view of the end users made me realize there is another important aspect which contributes to a successful HPC infrastructure, or supercomputer (if you insist): the users. Yes, them.

Even if you have a massive beast of a system, with a state-of-the-art network and storage infrastructure, the best processors money can buy and no budget limitations to pay for operational cost, it's the users that will determine whether or not it all pays off. Users need to know what they are doing, how to efficiently use the system, and how to avoid doing downright useless stuff on it. You won't believe how much computing time gets wasted by typos in scripts or quick-lets-submit-this-because-its-Friday-afternoon-and-I-need-a-beer experiments.

Frankly, I have no idea how they handle this at really large supercomputer sites, like the ones in the Top500 list. I hope they only start the really big experiments after thorough preparation, testing smaller-scale stuff first and making damn sure they've done the best they can to optimize the experiments. Otherwise, why even bother breaking the exascale limit?

Subscribe to RSS - supercomputing

Back to top