OpenCL 2.0 is here
Today at SIGGRAPH ’13, Khronos released a provisional specification specification of OpenCL 2.0. Hooray! Compared to 1.2, this is release is packed with unexpected yet cool stuff and features that were requested by users for quite some time now. The provisional specification can be found online, however the manpages are not fully built yet.
So, let’s don’t waste time. Here is what’s coming up.
Kernel argument introspection
I can only imagine that the Khronos developers came up with the existing
clSetKernelArg
API, to provide programmatic access from languages other than
C. However even with these good intents, it was not even possible to determine
the types of the argument of a clKernel
object hence making it extremely
difficult to provide a better API in C itself. After many users requested a way
to introspect kernel arguments, OpenCL 2.0 now brings clGetKernelArgInfo.
Unfortunately, it still is a bit awkward to use. For example, you cannot get the
size in bytes of the argument but instead you can query for the type name, from
which you could infer this information.
Shared Virtual Memory
Until version 1.2, you could use the same memory from host and device by using
the clEnqueueMapBuffer
and clEnqueuUnmapBuffer
operations. However, this was
restricted to contiguous memory buffers. OpenCL 2.0 provides a slew of
functions dedicated for shared virtual memory (… I guess pages) that
allow access of arbitrary pointer-based data structures.
Pipes
I am not sure about the usefulness of the new pipe objects. Apparently, you
create a new pipe on the host and pass the cl_mem
object to two kernels
that use a number of new built-in functions to communicate through these
objects. As far as I can see, the pipes only allow buffered, asynchronous
operation.
At the moment, I can hardly see any use case for these new objects. Do you?
Enqueueing kernels from within kernels
… also called “Dynamic Parallelism” by the Khronos folks. The new built-in
enqueue_kernel allows a work item to schedule a new kernel. Instead of a
receiving a kernel object or function pointer to a kernel
function, the
consortium decided to re-use Apple’s blocks to create enqueue-able
closures. According to the specification, enqueueing looks like this:
kernel void
outer (global int *a, global int *b)
{
ndrange_t range;
void (^accumulate)(void) =
^{
size_t id = get_global_id (0);
b[id] += a[id];
};
/* Pass block as variable */
enqueue_kernel (get_default_queue (),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
range,
accumulate);
/* Pass block directly */
enqueue_kernel (get_default_queue ()
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
range,
^{
size_t id = get_global_id (0);
b[id] = sin(a[id]);
});
}
SPIR intermediate representation
This is probably less of importance for users of the C and C++ API, but the Khronos group also started to define a LLVM-based intermediate representation of up to OpenCL 1.2 code. This will probably spawn a multitude of new OpenCL front-end languages.
Other built-ins and extensions
- Tired of calculating the linear index for two- and three-dimensional data accesses? With the upcoming specification you can use get_global_linear_id and get_local_linear_id for these tasks.
- There is now a whole bunch of atomic functions to change memory locations or test for conditions while being immediately visible to other work items of the same work group.
- Sometimes, it’s a pain to get resource de-allocation right. For sloppy
people like me, there is a new cl_khr_terminate_context extension, that
provides a
clTerminateContextKHR
to quickly shut down an OpenCL context.
Outlook
The specification is still in provisional state and the Khronos group encourages you to give feedback for the next six months. After that it will be set in stone. Probably two years later, NVIDIA will then provide an implementation for OpenCL 1.2.