Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Mon, 8 Oct 2012 20:04:10 +0200
From: magnum <john.magnum@...hmail.com>
To: "john-dev@...ts.openwall.com" <john-dev@...ts.openwall.com>
Subject: Memory leak in most OpenCL formats

Lukas, Claudio, all:

I found a nasty bug today that affects all OpenCL formats that use the shared find_best_workgroup() function. The bug is present in numerous formats in the released Jumbo-6 (I think) and Jumbo-7 (definitely). The fix (in each of those formats) is this:

@@ -249,6 +249,8 @@ static void crypt_all(int count)
                CL_FALSE, 0, settingsize, &currentsalt, 0, NULL, NULL),
            "Copy setting to gpu");
 
+       if (profilingEvent) clReleaseEvent(profilingEvent);
+
        /// Run kernel
        HANDLE_CLERROR(clEnqueueNDRangeKernel(queue[ocl_gpu_id], crypt_kernel, 1,
                NULL, &global_work_size, &local_work_size, 0, NULL, &profilingEvent),


The problem is that in order to use the shared function, you have to use profilingEvent in crypt_all(). But for *every* call to crypt_all(), the fact that "event" is not NULL causes a *new* event to be created (memory allocated etc) each time - and the reference to the last one is lost. This is a small memory leak that grows big during the actual crack. I found out because I accidentally had profileEvent passed with those calls in the RAR format even though it does not use that shared function. After the kernel splits, each crypt_all() resulted in at least 34 events created and then immediately lost forever...

I have committed a patch with the above fix for 22 of the 24 affected formats. The remaing two are Claudio's sha2crypt ones. They can't even be fixed like that, but need to revert to using their own private functions because the shared one can (currently) only handle a single kernel (you need a separate event for the first and last clEnqueue command). I'll let Claudio decide how to approach this: One possibility is to implement another shared function that use eg. firstEvent and lastEvent (but you'd also need to release both like above). The four formats I wrote all seem to work best at a fixed LWS of 64 for any GPU (and 1 or 8 for CPU depending on platform), so I'm currently not that interested in implementing this myself.

BTW, maybe there are better ways to handle this than the quick'n'dirty fix I made: I suppose we could change the global profilingEvent into *profilingEvent and do something like this in common-opencl:

opencl_find_best_workgroup(args)
{
	cl_event benchEvent;
	*profilingEvent = &benchEvent
	...
	(do the find_best stuff)
	...
	*profilingEvent = NULL;
}

Then it would point to a valid event during find_best() and then set to NULL during actual cracking, with no need to release anything for each crypt_all(). Theoretically this is cleaner but we'd still need to patch all 22 formats, changing &profilingEvent into just profilingEvent.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.