Monday 24 September 2012

GPGPU Computation: What it all means

GPUs or Graphics Processing Units , are specialized chips that come with almost all electronic gadgets today and were introduced primarily for games.In the late 1900s, game developers realized that most of the operations that they had to do for making their games 'look' good were actually quite simple ( basic addition, subtraction or logical operations ) and more importantly could be done in parallel. At the time you had these CPUs with dual cores capable of performing complicated instructions. This was good for general operations but if the gaming industry needed to expand, they needed a new device; something that could just do basic operations but with many threads running in parallel. The call for these devices was finally answered with Nvidia introducing the GeForce 256 in 1999 calling it the first Graphics Processing Unit ( GPU ) . In 2002, ATI entered the market as well with its Radeon 9700. Currently, the GPU market is almost completely controlled by ATI and Nvidia, with Intel being the major distributor of embedded graphics cards.
For many years after that ( and even today ) gamers around the world would boast about their new graphics cards blurting out clock frequencies and RAM sizes and comparing manufacturers. As the complexity of games increased the complexity of the GPUs needed to run them increased. On the software side tools like DirectX and OpenGL started to develop to simplify the game development process. And so by 2005, GPUs had made their mark on the game industry and some pretty cool software existed to harness their potential.
And then around 2006, a new trend was established. A trend which has gained significant popularity over the years and is slowly replacing conventional approaches to many problems.
The trend is called General-purpose computing on graphics processing units or  GPGPU computing. The term was introduced by Mark Harris around 2002 and refers to the approach of using GPUs for non graphics applications. Basically, the problem that spawned the generation of GPUs, namely the need for hardware with a large number of parallel cores was also observed in the fields of science and mathematics. Most of the sciences deal in one way or another with nature and in general most of the phenomenon in nature can be simulated in parallel. In complex mathematical operations and Monte Carlo simulations, calculations are independent and so the more calculations you can do in parallel the better. And so the same revolution happened and is happening in these fields as well. More and more academics from around the world are starting to see GPUs as viable alternatives to solve their problems. And rightly so.
I've talked about how GPUs are ideal for parallel processing but some of you might be asking, why not just run more threads on CPUs? To answer that, let me give you a flavor of 'how' parallel GPUs are. For example consider one of the newer GPUs, say the Nvidia GTX 680. Now, the GTX 680 has 1536 CUDA cores grouped into streaming multiprocessors ( SMP ). I'll describe the Nvidia execution model and the concept of SMPs in more detail in my next post. For now, lets just assume we have 1536 cores to run our code on. Now on the face of it, this is far more than any current generation CPU can give us, but there is more. GPUs execute a certain number of threads, called a warp, on each of these cores. For Nvidia, the warp size is 32. This means that to utilize the full potential of this GPU, you could run 1536*32 = 49152 threads in parallel at a time! Clearly FAR more parallel than your average CPU.
I'd like to elaborate on the GPU execution model and maybe introduce a few more technical terms in my next post, but for now I'll wrap by answering another question that may be bothering you: what's the catch? Most scientific endeavors which have shifted to GPU computing have reported speed ups of several orders of magnitude ( which is no small feat by any means ) , but these problems have to be of a certain type. These kinds f problems are often called 'embarrassingly parallel ' problems because the computations involved have very little inter-dependence and are of the order of 10s of thousands or more. In my illustration above I calculated around 49k threads running in parallel at a time. The catch is that you need to make sure your problem is big enough that you have 49k threads to run! Another thing that drastically reduces GPU speedups are conditional statements.For reasons which I will get into later, conditional statements can often drastically reduce the speed up that you get out of your GPU. So GPU computation can give you a gigantic improvement in run time as long as the problem that you are shifting meet certain minimum conditions. Quite often though, you can improve the speeds that you get out of the GPU by approaching the problem differently. The single threaded approach to problem solving is generally how we are programmed to think, but I fear the trends in computer science will render this way of thinking less and less useful in future.
So a different, more parallel way of thinking is the need of the hour. There are many reasons for this; perhaps the topic for another post. For now, I've introduced what GPGPU computing is and we'll get into a few more details and possibly touch upon CUDA and OpenCL ( the languages to code for GPUs ) in my next post.