Skip to Content
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.
  • : Function ereg() is deprecated in /customers/kejo.be/kejo.be/httpd.www/modules/logotool/logotool.module on line 327.

ACACES-2009 Summer School - Day 0: Getting there

It's been a while, I know. But I'll try and make up for it. Here's the first post in a ACACES-2009 Summer School series. Thanks to Itkovian for letting me use his pictures here.

On Sunday July 12th, I left home early to catch my direct Brussels Airlines flight to Barcelona. I woke up at 6.45am, to catch the train at 7.37am direction Brussels Airport. We arrived in the airport at 9.30am, way early for a flight due to leave at 11.25am. Once we got through checkin and security, we noticed a one hour delay while making our way to the departure gate. We killed the time by visiting Starbucks and purchasing internet access, only to find out later our flight was even more delayed: up to 2.5 hours. No fun, especially because the flight from Brussels to Barcelona only takes 2 hours. Urgh.

After skimming the Brussels Airlines General conditions online, it seemed we might be able to get some compensation, up to €250 for a flight of less than 1.500km which got delayed over 2 hours. Sadly, after visiting the service desk, it seemed I had missed something. The compensation was provided when a flight got cancelled, and a delay of over 2 hours resulted from that.

In our particular case the flight was only delayed, according to the guy behind the service desk. Although they needed to replace the plane, and make us wait for 2.5 hours until it was ready, the flight wasn't cancelled only delayed. Makes sense if you know that canceling results in paying each customer €250 cash, while delay only requires them to provide an €11 voucher for food and beverages, to use in the airport on the day of departure. Typical.

After finally arriving in Barcelona at 4pm and waiting for our luggage, we decided to share a taxi instead of using the train and shuttle bus to get to the hotel/conference centre La Mola in Terrassa. A wise decision, because we arrived in the hotel somewhat after 6pm, which allowed us to quickly freshen up before the keynote talk at 7pm.

The keynote talk on bio-inspired massively parallel computer architectures was given by Steve Furber (Manchester University). He talked about his SpiNNaker project, in which he and his colleagues are trying to build a system which should be able to simulate one billion neurons (roughly 1% of a human brain) in real time using one million ARM processors. Cylons, anyone?

After the keynote, we rushed to the buffet dinner. The buffet was OK, but not fantastic, and to me it was lacking something else than regular water as an accompanying drink. Ah well.

Before calling it a night, some of the UGent participants decided to taste the Spanish beer from the hotel bar. Although it was quite expensive (€4.50 for a large beer), I quite enjoyed it to wash down the mostly frustrating day. Once the beer was sipped down slowly, we all went to bed, making sure we would be able to get up early the next morning for the start of the classes.

More on boegelbot

A while ago, I mentioned that I was working on a small robot to join in on the IEEE Student Branch Ghent Sumo robot competition.

In the weeks since my last post on this subject, I finished up the soldering of the electronics board which serves as a brain for the robot, built boegelbot using the dusty lego parts from when I was a kid and participated in the actual competition...

With 3 out of 5 matches won, I consider this project a modest success.
I managed to build a small robot from scratch, with basically zero prior knowledge of electronics or robotics.
The robot is able to stay on the playing field autonomously by relying on its four contrast sensors which detect the white border of the black playing field, and can detect the enemy robot reasonably quick by using a distance sensor.

Unfortunately, I don't have any video clips of the actual competition matches featuring boegelbot yet. The organizers did record some stuff, but haven't made the clips available yet.
Some action clips of the best robots can be viewed here. Make sure to view the ones with the El Toreador house robot, which used interesting tactics to try and fool it's enemy (sadly without much success though).

I hope I can post some clips of my robot some time soon...

Why we should still care about singlecore performance

First off, don't expect to read some brand new genious insight here, I'm not telling anything new.

I just wanted to draw some attention to a great lecture by Mark D. Hill on "Amdahl's law in the Multicore Era", which is available on the Google Tech Talks YouTube channel.

Last weekend, I finally found some time to watch it. Although most of this stuff isn't new to me, I liked the way in which Hill explained everything.

Although processor manufacturers (Intel, AMD, ...) are well aware of the importance of singlecore performance (see the recent Intel Nehalem architecture for example), it never hurts to be reminded of the fact that performance boosts won't come for free by throwing more cores at it. And this holds even if we can parallelize the major part of our favorite application and/or have tens to hundreds of cores available.

If you're interested in performance and the recent multicore trend, you should make sure to take the time to watch the lecture.

Suggestions for other interesting lectures like this one, on related or less related subjects, are welcome...

Just married!

Last weekend, we finally got married after being together for over 7 years, and living together for almost 3 years.
After over a year of preparation, it was about time.

We had an amazing day, celebrating our love with our family and closest friends.
Although the weather gods didn't feel like cooperating fully, we had no reason to complain in the end. While our wedding pictures were being taken in Ostend, along the Belgian coast, the photographer was able to capture our faces in sunlight. During the rest of the day, we were able to keep ourselves dry whenever we had to go outside, staying out of the rain showers in between.

The party at the end of the day was great fun, although we were both very tired.
We managed to enjoy the great food a little, and danced the night away with family and friends, thanks to my uncle serving as our party DJ.

All in all, a day we'll never forget.

A sneak preview of the many pictures taken can be seen at http://www.kejo.be/node/21, more will follow.

Stag night

Because I'm getting married in two weeks, and traditions are there to be kept alive, my stag night was held last Saturday.
Close relatives and some friends joined in on the fun, for a night I'll never forget.

Although no naked women (or men) were involved, we had tons of fun.
The guys made me dress up as a bunny, the main reason being the nick name I give J. (and vice versa). On top of that, I had to sell carrots by the piece, in order to earn money to get me through the night (beer-wise).

As the night progressed, and more beers were consumed, I got more and more creative in selling the carrots. The last one was even sold by auction, in which the bartender ended up in giving €2.5 for that last precious carrot, more than enough for 2 beers (yes, cheap bar).
In the end, I managed to earn a staggering €45. Financial crisis, anyone?

By the end of the night, the girls ended up in the same bar as we did, which was great fun. Although I had more than enough beers, the hangover on Sunday morning wasn't that bad at all.
I blame the high quality Belgian beer. You should try it some time, if you haven't already, you won't regret it...

Supercomputer @ UGent

Yesterday, I attended the official launch of the brand new supercomputer at my university. Being one of the pilot users testing the system, I received a nice shiny invitation for this event a few weeks ago.

Although the system won't be made accessible to all researchers at the university until mid April 2009, I've been lucky to be one of the selected few able to play around with it for over a month now.

The system, which was built by IBM/Clustervision, currently consists of 194 blades, each having two quad core Intel Xeon L5420 2.5GHz processors with 12MB of L2 cache (Core2 architecture) and 16GB of memory, for a grand total of 1552 cores and 3.1TB of memory. The blades are connected through an Infiniband network, proving a stunning 20Gb/s, and there's roughly 60TB of storage space, using GPFS as file system. All this is enough for the 496th place in the top 500 of supercomputers (last updated Nov. 2008). *drool*

It has proven to be remarkably stable during the pilot phase, which is a significant improvement over BEGrid, our current main source of computing power. But of course, the system won't be tested to its limits until the first few days after it goes live...
The support from the admins in terms of communication, maintenance and software, has been outstanding so far. All this is a crucial, in order to ensure that the system will be used to its full capacity.
All in all, it seems like the investment of over €1M has been well worth it.

Now, all we'll have to do is come up with something useful to do with it... ;-)

Creating a SumoBot from scratch (and Lego)

Recently, the SumoBot competition which is being organized by the local IEEE Student Branch caught my attention.

The goal of the competition is to build your very own intelligent robot, which is capable of pushing another robot out of a playing field (hence the term SumoBot).
The robot will be built using Lego, and the brain will consist of a microcontroller, which will process input coming from various sensors (distance, light, ...) and steer the Lego engines to make the robot move.

Last Tuesday, the kick-off of the SumoBot competition was held, during which it became clear that a lot of students (Bachelor, Master and Phd) were interested to join in on the fun.

Although I waited until the last minute to actually join in, I don't think I'll regret it. It seems like a good way to learn (a lot) about electronics and AI. And of course, being able to make a robot do stuff you want it to is just cool.

After buying the microcontroller package for just €30 and opening it, I realized that my knowledge on electronics would be seriously tested.
Here's a shot of what was in the package:

I need to solder this all together to one (hopefully) working electronics board.
Since I had never soldered anything before, this seemed like quite a challenge.

Luckily, the organizers were very helpful with ignorant people like me. They carefully explained me how the soldering should be done, and how to start this thing off best, i.e. by starting with the small, low parts.

Since then, I've spent two hours carefully soldering some of the parts on the board, resulting in this:

I hope I'll find the time soon to finish up on this.

In the mean time, I've (literally) dusted off my collection of Lego, trying to find bits and pieces which might come in useful when designing the robot. I knew my love for Lego System and Lego Technics would come in handy some day...

In the next few weeks, I'll be gradually building and programming my SumoBot, in order to kick some serious robot ass on April 28th, when the actual competition is being held.

Stay tuned for more...

Using Haskell to win the Netflix Prize

As mentioned in my last post, I've been thinking on competing for the Netflix Prize. To spice things up a bit, I decided to implement my algorithm in Haskell. Yes, I'm crazy.

Haskell is a functional programming language, with various cool features such as explicit purity, lazy evaluation, higher order functions and a strong static type system. If you're unsure what all those words mean, you should find out more about Haskell. Trust me, you'll fall in love with it.

The last time I did something remotely serious with Haskell was almost 4 years ago, i.e. writing a ray tracer, HRay, as a part of my Masters' thesis in Computer Science.
I missed implementing stuff in it ever since.

I decided to go with Haskell for this project for various reasons, most importantly because I wanted to freshen up my Haskell-fu and learn more about Haskell (what the hell are monads?!?).
That way, I'll at least gain something from it, just in case I don't actually win the $1M.

The last couple of days, I've been trying to figure out what the best data type would be to handle the massive amount of data, i.e. over 100M user ratings for 17,770 movies from over 480K users.
A first, very naive, attempt looked something like this:

  1. -- map of user ID to movie ID/rating pairs
  2. type Data = Map Int [(Int,Word8)]

Although this might seem like a nice way to go, it performs horribly in terms of memory usage. The reason is simple: normal lists have a huge amount of overhead for each element.
I was told by the nice helpful folks at #haskell that it takes at least 24 bytes of cruft for each list element on a 64-bit system, and that's ignoring the use of pairs as list elements. If you have over 100M of these elements to keep track of, that's a lot of cruft to drag along with you.

After careful thought, and taking into account input from various smart people, I ended up with something like this:

  1. -- map of movie IDs to pairs of UArrays,
  2. -- containing user IDs and ratings
  3. type Data = IntMap (UArray Int Int, UArray Int Word8)

This is a lot more memory efficient, for various reasons:

  • It avoids a lot of overhead coming from the map, because now I only have ~18K keys (the movie IDs), instead of over 480K keys. Also, IntMap should be more efficient than Map Int, at least performance-wise.
  • By using unboxed immutable arrays (UArray), I make sure that each user ID/rating only takes as much space as needed, i.e. 4 bytes (Int) and one byte (Word8) respectively. So, a lot less cruft to drag along.
  • Making sure everything is kept strict, by using UArray (and not Array) and applying seq in the right places, avoids large thunks in memory. Laziness is a nice feature, but it can seriously bite you in the ass in terms of memory usage.

Using this data type, I'm able to read in the entire Netflix data set (ignoring the time stamps) in 26 minutes on my old rusty PowerBook G4, using only 700M of memory. I'll do the real number crunching on our iMac though, don't worry.

Although I'm aware that the memory usage can be pushed down further, I decided to stick with this data type.
I'm glad I managed to get this far using clean Haskell code, i.e. without using any 'nasty' hacks such as allocating memory myself instead of trusting GHC do to it for me.

Next up is writing some analysis code, to figure out if what I have in mind will actually work.
Let the games begin!

The Netflix Prize: why I think SVD won't cut it

A few months ago, while skimming the Slashdot RSS feed, my attention was drawn to a post on the Netflix Prize.

Geeks will (should!) already have heard about this, but in short: the Netflix Prize is an online competition, in which each team needs to come up with a better movie rating prediction algorithm than the one Netflix is using (named CineMatch), better meaning achieving a 10% lower error rate. In case you're wondering why this is so interesting: Netflix is offering $1M to the first team able to do so.

However, the prize money wasn't the main reason for drawing my attention. At first, I figured "If it's worth $1M, it must be damn near impossible, right?". Well, maybe not...

The Slashdot post referred to an article in the New York Times, which briefly touched on the main statistical technique the top teams are mostly relying on, Singular Value Decomposition.

Singular Value Decomposition (SVD) tries to determine so called factors, in this case one factor for each user and movie. The idea is that each factor corresponds to an underlying trend in the data, which may (or may not) have an intuitive meaning, e.g. the amount of action in a movie or how much a user likes/dislikes romantic movies.
Each factor consist of a set of parameters, i.e. real values (e.g. there is 0.6 action in movie X, on a scale between 0 and 1). Combining the parameters of a user and movie, by summing the pair-wise products, should yield an accurate estimate of the user rating.

Because I've been using a strongly related technique in my Phd research, i.e. Principal Components Analysis (PCA), I figured I might actually have a small chance at coming up with a decent algorithm. So, I started looking into details.

Netflix provides a huge data set (~700MB) of user ratings for a set of movies. In total, there are just over 100 million user ratings from 480,109 different users, for 17,770 different movies.
The evaluation of the algorithm a brave soul comes up with is done by seeing how accurately they can predict user ratings for a quiz, i.e. a list of user/movie pairs with unknown corresponding user rating.
To avoid a brute force attack, each user is limited to one submission of an answer to the quiz each day, and only an overall error rate is returned.

After playing around with this for a while, I realized that using a technique like SVD or PCA is probably not the best way to tackle this.
The reason is simple: the sparsity of the data. If you think about the data set as a 480,189x17,770 matrix, then you quickly realize that the provided set of user ratings only fills in ~1.1% of the matrix elements.
So, a lot of data is missing, and SVD will only be able to come up with (very rough) estimates of the factors, based on the available user ratings.

Although the SVD technique matches the problem intuitively, and most teams using it seem to come up with algorithms that are doing pretty well, I don't think that there's enough data available for this technique to yield really good results.
In fact, the top teams are combining this sort of technique with various other methods to be able to achieve good predictions.

Although I have no faith in the SVD technique mentioned above, that doesn't mean I don't think I can come up decent algorithm... It just means I won't be using something like SVD. I'll go deeper into what I do want to rely on in a later post.

For now, I should try and find some time to actually experiment with the ideas I have for this. Right now, some of the top teams are already able to outperforms the Netflix algorithm by over 9.5%, see the Leaderboard for more details.
So it's only a matter of time until someone actually is able to really beat CineMatch, and collect the $1M prize...

A challenge for Mollom

When I was checking the Akismet moderation queue of our (old) photo blog, I noticed that spammers are doing their best to outsmart automated spam blockers such as Mollom.

Here's a screen shot of the spam message I bumped into:


To be honest, it fooled me when I saw it passing by, and I almost approved it as a genuine message.

The trick they're trying to pull here is to simply copy a previous comment, and add a hyperlink somewhere in it. That way, the user would be tricked into following a link he/she believes might be interesting.

I don't know if this is done is any automated way, but I'm probably not the first to notice this kind of spam. I wonder if a service like Mollom would be able to recognize it. Time will tell...

 

 

Syndicate content