AMD's OpenCL multi GPU bug
Implementing open standards such as OpenCL seems to be a daunting and error prone task. Here’s a problem that went unnoticed for quite some time on an AMD-based system.
To compile an OpenCL kernel for several devices, one has to use clBuildProgram
that takes a program created with clCreateProgramWithSource
or
clCreateProgramWithBinary
and a list of devices. In plain C it looks like
this:
static const char *source = "__kernel void foo(void) {}";
cl_program program;
cl_device_id devices[2];
program = clCreateProgramWithSource (context, 1, &source, NULL, NULL);
clBuildProgram (program, 2, devices, "", NULL, NULL);
In a heterogeneous environment, it is possible that different devices of the same vendor happily co-exist on the same platform. At least, that’s the core idea of OpenCL. In order to write optimized kernel code, we can define pre-processor defines and compile the kernel for each device, just like this:
clBuildProgram (program, 1, &devices[0], "-DTYPE=FOO", NULL, NULL);
clBuildProgram (program, 1, &devices[1], "-DTYPE=BAR", NULL, NULL);
Inside the kernel you could write conditional code with #ifdef
and switch
between two alternative implementations. However, this works only on NVIDIA
hardware. Once you enqueue a kernel on the first command queue (the one attached
to the first device) of an AMD device, you will receive a
CL_INVALID_PROGRAM_EXECUTABLE
error. In my humble opinion this is a clear
violation against the spec:
The function [clBuildProgram] builds (compiles & links) a program executable from the program source or binary for all the devices or a specific device(s) in the OpenCL context associated with program.
When I interpret this correctly, the standard does not restrict usage on
“specific device(s)”, hence I must be able to run clBuildProgram
as often as I
wish. Now, to check if you are affected, you can use my regression tool.
When run on an AMD system, you will see something like this:
# Platform: OpenCL 1.2 AMD-APP (1084.4)
# Device 0: Tahiti
# Device 1: Tahiti
# Device 2: Tahiti
# Device 3: Tahiti
[...]
Creating program `amd_bug`: OK
Build program for both GPU 1: OK
Build program for both GPU 2: OK
Created kernel `amd_bug': OK
Started kernel `amd_bug' on queue 0 [expect CL_INVALID_PROGRAM_EXECUTABLE]: Error: CL_INVALID_PROGRAM_EXECUTABLE
Started kernel `amd_bug' on queue 1 [expect CL_INVALID_PROGRAM_EXECUTABLE]: OK
As with NVIDIA’s bug, there is no real workaround for this issue. So, let’s hope that AMD will address this problem.