Running OpenACC without a GPU: a sneak peek of accULL

In our research group (GCAP)  we have been working on directives for GPUs for the last three years (MICPRO-2011,JoS-2011). When the OpenACC standard was announced in the SC11, we found plenty of similarities with the language we were working at the moment. Immediately we focused our work in supporting and improving the new standard.
Although the amount of work to fully implement the OpenACC standard is not negligible, our compiler framework, yacf, provided us with the required flexibility to build a parser and a lowering phase to our runtime in a couple of weeks.

Most of our recent work will appear on conferences in the upcoming months. We will be presenting contributions about accULL in the following conferences:

Feel free to speak with us there. We will provide slides and detailed information in the upcoming weeks.

Source code of the compiler framework is already available , and if you are interested we can provide the development version of accULL. A public repository will be available in the near future. On the meantime, we show here a short “teaser” of the OpenCL support that we’ve implemented in the runtime.

Picture below shows execution time for a Molecular Dynamics simulation implemented in OpenACC, running on top of one of our development servers, M4. M4 is a shared memory system (that’s it, no GPU at all) with 4 Intel(R) Xeon(R) E7 4850 CPU processors.

The usual approach to implement algorithms for these architectures is to use OpenMP, however, using the Intel OpenCL platform it is also possible to run OpenACC on top of the server.

Red bars shows the execution time of OpenMP (provided by GCC 4.4.5) and green bars shows the execution time of the same code using accULL OpenCL backend.

In this case, the runtime library detects that it is possible to avoid transfers and uses mapping instead, thus avoiding unnecessary memory transfers. The Intel OpenCL platform takes advantage of the AVX instruction set, and the kernel is nicely transformed on vector instructions, which execute natively on the CPU.

accULL can also be used to run OpenACC programs while you are traveling (which is sometimes useful!). The following figure shows the same molecular dynamics simulation running on my  laptop.

Using an environment variable, runtime can switch between using the internal GPU or the CPU (again using the Intel OpenCL platform). The largest problem size froze my laptop, the problem was too big for 512Mb of graphics memory.

I hope this gives you an idea of the kind of work we are doing within accULL. More information will be available on the following weeks!