Survey of GPGPU Programming tools on Windows

The following is a summary of commonly used tools and techniques for General Purpose Graphics Processing Unit (GPGPU) Programming which is an increasingly relevant topic to the modern programmer.  This is not an exhaustive list, but is a set solid tools that make it easier to embrace this amazing technique for massive parallelization that is changing the way we process data.

NVIDIA CUDA + Visual C++

NVIDIA, who are undoubtedly the leaders in GPU programming, provide a lot of support to the community through their website and forums.  NVIDIA’s integration into Visual Studio 2010 is really quite good and they provide fantastic tools to provide deep insights into your GPU (like Parallel NSight).  These tools are also free, though you do have to register for them.

This approach requires you to write your code in C++ in a mostly STL like manner, but you still get all the great features of Visual Studio.  If you’re not afraid of C++ this is a good route to follow.  Your code is actually compiled by NVIDIA’s compiler and becomes a GPU executable (or the parts of that run on the GPU do).  Directives tell the compiler which methods run on what hardware.


This is one of my favorites.  This toolset sits on top of the NVIDIA tools and uses reflection to turn your C# into a CUDA kernel (execution unit).  You simply decorate a method with an attribute and you’re set – it runs on the GPU card.  This is really slick and you can see the C++ that it gets translated too.  Like with the raw NVIDIA approach you still need to manage moving memory around, but this is really a pretty good tool and I’ve been using it a lot.  It is actively developed and supported and the community around it, though small, is very responsive.  If you’re not comfortable with C++ try this out.

Microsoft Accelerator

This is perhaps the most compelling of the current set of tools and is put out by Microsoft Research.  It is also a fairly radical departure from the tools above – or any others I have seen.  Accelerator is focused quite squarely on array processing (which is the most common type of GPGPU example) and is a functional programming model that translates your operations to parallel code.  Unlike the previous approaches this translation can be targeted to multi-core CPU, CUDA (NVIDIA), and DirectX environments.  This is really promising and really cool.  I can definitely say it makes parallelizing computations on many core systems really easy, but I’m not sure yet if it provides the low level memory management that be so critical to CUDA implementations.  I plan to keep investigating.  This is more than the Parallel For framework, it’s a different approach to writing your algorithms.  I like it.

Microsoft AMP (preview documentation)

Finally there is C++ Accelerated Massive Parallelism (AMP) from Microsoft.  This is a combination of Accelerator and CUDA in some ways.  You do write unmanaged code, in an STL style, but it can target multiple platforms (a pretty serious shortcoming in CUDA – or at least AMD would say so).  This is via a DirectX target.

I’ve not had a chance to work with this yet, but it is a part of Visual Studio 11 and I’m sure I’ll get my hands into that next week at the MVP Summit.  Microsoft plans to make these language extensions / features free and open (potentially) to provide a C++ alternative to the strictly C focused OpenCL.  This is really promising.