It is common sense that for a long-running system, it’s of utmost importance that no piece of code is leaking memory or other resources. Otherwise, the system will crash sooner or later due to memory exhaustion. This is what happened with a distributed system I developed for high performance image processing.
In this system, each network node – each having at least one GPU – runs a small piece of server software that prepares an image processing graph and processes incoming data. When the data stream has been processed, the server releases almost all resources and waits for the next data processing task. To be sure, that everything was working as expected, I ran a stress test during the night. Of course, at some point the server crashed with a segmentation fault. Using nvidia-smi, I found out that the GPU has allocated more and more memory and faced an out-of-memory situation at some point.
OpenCL has a straightforward system of reference counted resources. You create a
context, command queue, buffer, program or kernel with its corresponding
clCreateFoo() function. If several independent software parts reference an
OpenCL resource for internal use, the reference count can be increased with
clRetainFoo(). When a resource is no longer of any use, it can be discarded
clReleaseFoo(). Once the reference count reaches zero, any associated
resources such as memory are freed. At least, that is what I expected.
However on NVIDIA systems, a final
clReleaseMemObject() will not free
the memory segment in GPU memory, if not every other OpenCL object has been
freed too. Consider this small snippet:
n_elements = 1024 * 1024; mem = clCreateBuffer (context, CL_MEM_READ_WRITE, n_elements * sizeof (float), NULL, &errcode); /* Launch kernel with one parameter */ clSetKernelArg (kernel, 0, sizeof (cl_mem), &mem); clEnqueueNDRangeKernel (cmd_queue, kernel, 1, NULL, &n_elements, NULL, 0, NULL, &event); /* Wait for end of execution and release all resources */ clWaitForEvents (1, &event)); clReleaseMemObject (mem); clReleaseKernel (kernel); clReleaseProgram (program); clReleaseCommandQueue (cmd_queue); clReleaseContext (context);
This looks innocent, however repeating this over and over again will give you a
CL_MEM_OBJECT_ALLOCATION_FAILURE eventually because we did not release the
event object and thus did not clean up GPU memory.
So, if you are developing with NVIDIA’s OpenCL implementation and what to ensure a stable system:
- Check that each call to
clRetainFoo()is accompanied by a
- Beware of
cl_eventobjects that are created implicitly by calls to the
- Do not assume that
clReleaseContext()will release resources that are associated with the context.
The oclkit API
To check the leakage behaviour, I wrote a small OpenCL application. I did this
many times before, but until now, I always copied the boiler plate
from here to there. To avoid this in the future, I came up with a sanitized
version of the boiler plate code called oclkit. It’s simple C99-only code
that sets up a valid OpenCL context and command queues for each device. You can
plug oclkit into your own (GPL-compatible) apps or just run the leak example
./configure && make && ./leak.